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PROCEDE DE COMPACTAGE D'UN PROGRAMME DE TYPE CODE OBJET INTERMEDIATE EXECUTABLE 
DANS UN SYSTEME EMBARQUE MUNI DE RESSOURCES DE TRAITEMENT DE DONNEES, SYSTEME 
COMPACTEUR ET SYSTEME EMBARQUE MULTI-APPLICATIONS CORRESPONDANTS. 



_ L'invention concerne un precede et un systerne de 
compactage d'un programme intermediate. 

Le programme est soumis £ une recherche (1000) de 
sequences identiques (Sj) et k un comptage du nombre Nj 
d'occurrences de chaque sequence (Sj). Un test (1001) de 
comparison de superiority d'une fonction f (Nj) & une valeur 
de reference permet de cr^er (1 003) une instruction specifi- 
que de code sp6cifique (Cj) auquei est assoctee la sequen- 
ce (Sj). Un remplacement de chaque occurrence de la 
sequence (Sj) par le code sp6clfique (Cj) est effectue (1004) 
dans ie programme intermediaire pour engendrer un pro- 
gramme intermediaire compacts (FCC) auquei est assocte 
un fichier d'ex^cution (FEX). 

Application & des objets portatifs multi-applications tels 
que carte £ microprocesseur, syst&mes embarques ou ana- 
logue. 




RECHERCHE DANS PROGRAMME 
WTERMEDUlHf OE SEQUENCES 
IOEWTQUES Si CT LEU) NOVME 
P'OCCURRENCES Ni 



—•ODD 




CREATO* iNSIHUCTOK SPECIF DUE 
ZSfsCitSi 



REMPLACEhCNT DANS 
PROGRAMME NTERME*OtAKE 
0E Si 3AR Ci 



^004 





2785695 



l 

PROCEDE DE COMPACTAGE D 'UN PROGRAMME DE TYPE CODE OB JET 
INTERMED I AIRE EXECUTABLE DANS UN SYSTEME EMBARQUE MUNI DE 
RESSOURCES DE TRAI TEMENT DE DONNEES , SYSTEME COMPACTEUR 
ET SYSTEME EMBARQUE MULTI -APPLICATIONS CORRESPOND ANTS 

La presente invention est relative a un precede de 

compactage d'un programme de type code objet intermediaire, 

executable dans un systeme embarque muni de ressources de 

traitement de donnees et au systeme compacteur 

correspondant . 

Les systemes embarques munis de ressources de 
traitement de donnees actuels permettent de remplir des 
fonctions de plus en plus complexes et de plus en plus 
nombreuses, en raison de 1 ' opt imisation croissante de 
l'adequation entre le materiel, constitutif de ces objets 
portatifs, et des logiciels, ou plus particulierement des 
programmes ou applications implantes dans ces derniers, afin 
de leur conferer une ou plusieurs f onct ionnalites 
specif iques. La notion de systeme embarque recouvre tout 
systeme informatique portable, tel qu'objet portatif, carte 
^ microprocesseur ou analogue, distinct d'un micro- 

ordinateur classique. 

C'est en particulier le cas des cartes a 
microprocesseur, encore appelees cartes a puce, telles que 
representees en figure la, pour lesquelles on utilise un 
compilateur pour engendrer des instructions et un 
interpreteur permettant d 1 assurer 1' execution de ces 
instructions par le microprocesseur, ainsi que represents en 
figure lb. De maniere classique, ainsi que represents sur la 
figure la, une carte a microprocesseur 10 comprend un 
systeme d ' entree/sortie 12, relie au microprocesseur 14, une 
memoire RAM 16, une memoire non volatile 18, constitute par 
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une memoire morte ROM 18b et une memoire programmable 18a. 
L 1 ensemble de ces elements est relie au microprocesseur 14 
par une liaison par BUS. Un module 20 de 
chif f rement/dechif f rement de donnees peut, le cas ech<§ant, 
5 etre prevu. 

L ' implantation de 1 T ensemble des elements logiciels 
d 1 applications, tels que porte-monnaie electronique, 
commerce electronique ou sante, dans la memoire programmable 
non volatile, de l'interpreteur en memoire programmable non 

10 volatile ou en memoire morte et du systeme d 1 exploitation, 
en memoire morte ROM, est representee en figure lc. 

Le code objet intermedia ire est engendre par le 
compilateur a partir d'un programme source, le plus souvent 
ecrit en langage de haut niveau, a partir des caracteres 

15 ASCII . Le programme source et le code objet intermediaire 
correspondant peuvent etre executes par tous les 
microprocesseurs usuels, car l'interpreteur assure 
1' adaptation logicielle des instructions standard du code 
objet intermediaire en instructions directement executables 

20 par le microprocesseur. 

A titre d'exemple non limitatif, les fabricants de 
cartes a microprocesseur ont recemment developp6 des 
interpreteurs implant6s dans la memoire morte ROM. Ce type 
d 1 interpreter lit de fagon sequentielle un programme ou 

25 code objet intermediaire, support d'une application par 
exemple, charg§ par exemple dans la memoire programmable de 
la carte a microprocesseur. Chaque instruction standard de 
ce code objet intermediaire est interpretee par 
l'interpreteur, puis ex6cutee par le microprocesseur. En 

30 regie generale, les instructions standard du code objet 
intermediaire permettent de traiter des fonctions evoluees 
telles que le traitement arithmetique et la manipulation 
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d'objets. La notion d'objet concerne les. objets 
in format i que s tels que des listes, tableaux de donn6.es ou 
analogues. 

Toutefois, en raison notamment du. caract^re portatif 
5 de ces cartes a microprocesseur, 1 ' encombrement et la taille 
de ces derni^res sont limi.tes. II en est de meme de la 
taille de la memoire programmable de ces derniers, laquelle 
est, par construction, limit6e a quelques kilo-octets. Une 
telle limitation structurelle ne permet pas la mise en oeuvre 
10 de gros programmes d' application . 

En outre, la tendance actuelle de la mise en oeuvre 
de systemes embarques multi-applications trouve une 

limitation r^dhibitoire a la multiplication du nombre 
d' applications instances sur un meme systeme embarqu6 ou 
15 carte S microprocesseur, £ un nombre exc^dant rarement trois 
applications . 

La presente invention a pour objet de remedier £ 
1 1 inconvenient pr6cit6 par la mise en oeuvre d'un procede de 
compactage d'un programme de type code objet. intermediaire, 

20 utilisable dans un systeme embarque de type carte k 
microprocesseur, afin de liberer de l'espace memoire dans la 
memoire programmable de ce systeme embarque et permettre 
ainsi 1 ' implantation d'au moins une application 
suppl6mentaire, apr6s compactage de cette derniere. 

25 Un autre objet de la presente invention est en outre 

la mise en oeuvre d'un systeme de compactage de programmes de 
type code objet intermediaire permettant 1 1 implantation d'un 
programme de type code objet intermediaire compacte dans un 
systeme embarque multi-applications muni de ressources de 

30 traitement de donnees permettant 1' execution de programmes 
de type code objet intermediates compact^s en 1' absence de 
modification notable de la dur6e d'ex6cution, et en 
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transparence totale vis-a-vis du processus inherent a chaque 
application non compactee. 

Le procede de compactage, objet de 1' invention, d 1 un 
programme de type code objet intermediaire consistant en une 
5 suite d' instructions standard, ce systeme embarque etant 
dote d'une memoire et d'un interpreteur de langage du 
programme de type code objet intermediaire en instructions 
d'un code objet directement executables par un 
microprocesseur et ce programme etant normalement memorise 

10 dans la memoire de ce systeme embarque, est remarquable en 
ce que 1 1 on recherche dans le programme de type code objet 
intermediaire des sequences identiques d ' instructions 
standard successives et 1' on soumet les sequences identiques 
d 1 instructions standard successives a un test de comparaison 

15 de superiority d'une fonction d'au moins le nombre 
d 1 occurrences de ces sequences dans le programme de type 
code objet intermediaire a une valeur de reference. Sur 
r£ponse positive au test pr6cite, pour chaque sequence 
identique d 1 instructions standard successives satisf aisant a 

20 l'etape de test, on engendre une instruction specifique par 
definition d'un code op^ratoire specif ique et association A 
ce code operatoire specifique de la sequence d 1 instructions 
standard successives ayant satisf ait a ce test. On remplace 
en outre, dans le programme de type code objet intermediaire 

25 memorise, chaque occurrence de chaque sequence 
d' instructions standard successives par le code op6ratoire 
specifique qui lui est associe, pour obtenir un programme de 
type code objet intermediaire compacts, succession 
d' instructions standard et de codes op£ratoires sp6cifiques. 

30 On memorise dans la memoire une table de decompactage 
permettant la mise en correspondance biunivoque entre chaque 
code operatoire specifique introduit et la sequence 
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d 1 instruct ions standard successives associee a ce dernier. 
Ce processus permet d'optimiser 1 1 espace memoire occupe par 
le programme de type code objet intermediaire compacte par 
memorisation dans la memoire programmable d' une seule 
occurrence des sequences identiques d f instructions standard 
successives. 

Le procede, le systeme de compactage d'un programme 
de type code objet intermediaire et le systeme embarque 
multi-applications correspondant , objets de la presente 
invention, trouvent application dans le domaine technique 
des systemes embarques, plus particulierement dans la mise 
en ceuvre et la gestion de cartes a microprocesseur . 

lis seront mieux compris £ la lecture de la 
description et a 1 1 observation des dessins ci-apres dans 
lesquels, outre les figures la £ 1c relatives a l'art 
anterieur, 

- la figure 2a represente un organigramme general 
illustratif d'un procede de compactage d'un programme de 
type code objet intermediaire, selon la presente invention ; 

. .. . - la figure . 2b_ .represente un schema synoptique 
illustratif de la mise en oeuvre des differents operateurs 
necessaires a l'obtention d'un programme de type code objet 
intermediaire compacte et de param^tres permettant le 
decompactage ou 1' execution de ce programme ; 

- la figure 2c represente, a titre purement 
illustratif, 1 ' implantation en memoire programmable, non 
volatile, d' une carte & microprocesseur de ce programme de 
type code objet intermediaire compacte et des parametres 
d' execution ou decompactage de ce dernier ; 

- la figure 3a represente, dans un mode de 
realisation particulier non limitatif, un schema illustratif 
de la structure d'un premier fichier constitutif de ces 
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parametres d 1 execution ou decompactage de ce programme de 
type code objet intermediaire compacte ; 

- la figure 3b representee dans un mode de 
realisation particulier non limitatif, un schema illustratif 

5 de la structure d'un deuxieme fichier constitutif de ces 
parametres d\ex§cution ou decompactage de ce programme de 
type code objet intermediaire compacte ; 

- la figure 4 representee a titre illustratif, 
1 1 implantation en memoire programmable non volatile d'un 

10 programme de type code objet intermediaire compacte, selon 
la presente invention, dans une carte a microprocesseur ou 
systeme embarque multi-applications ; 

- la figure 5 representee a titre illustratif, un 
processus de mise en oeuvre specifique du procede de 

15 compactage d'un programme de type code objet intermediaire 
dans lequel une actualisation des codes specif iques relatifs 
a des applications ou programmes de type code objet 
intermediates distincts est realise ; 

- les figures 6a et 6b representent , sous forme 
20 d' elements f onctionnels, un systeme de compactage d'un 

programme de type code objet intermediaire conforme a 
1' objet de la presente invention. 

Le procede de compactage d'un programme de type code 
objet intermediaire, conforme a 1' objet de la presente 
25 invention, sera maintenant decrit en liaison avec la figure 
2a. La designation programme type code objet intermediaire 
recouvre tout programme intermediaire dans la presente 
demande de brevet. 

Ce procede sera decrit de maniere non limitative 
30 dans le cas de sa mise en oeuvre dans un systeme embarque 
constitue par exemple par une carte a microprocesseur telle 
que representee en figure la, ce programme de type code 
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objet intermediaire etant obtenu de maniere classique, ainsi 
que represents en figure lb, et 1 1 implantation en memoire 
programmable d'une pluralite d 1 applications de 

1 ' interpreteur et du systeme d 1 exploitation OS en memoire 
5 ROM etant representee en figure lc, de maniere non 
limitative. 

Le programme de type code objet intermediaire 
consiste en une suite d 1 instructions standard executables 
par le microprocesseur par 1 ' intermediaire de 

10 1 ' interpreteur . 

Le procede de compactage d'un tel programme 
consiste, pr6alablement £ 1 ' implantation de ce dernier en 
memoire programmable 18a, £ effectuer, ainsi que repr6sente 
en figure 2a, en une 6tape 1000, une recherche dans le 

15 programme de type code objet intermediaire des sequences 
identiques d 1 instructions standard successives, ces 
sequences identiques etant not§es Si. Par sequences 
identiques, on indique une suite d'un nombre n d 1 octets 
determine susceptible d'apparaitre de maniere repetitive 

20 dans le programme de type code objet intermediaire precite. 
Ainsi, le rang i des sequences identiques indique, pour des 
valeurs de i differentes, des sequences distinctes. En 
outre, l'etape 1000 de recherche precitee consiste a 
determiner le nombre d 1 occurrences Ni de chaque sequence 

25 identique S ± precitee. A 1' issue de l'etape 1000 de 
recherche, on dispose d'une pluralite de sequences 
identiques Si, chaque sequence Si etant distincte, et d'un 
nombre Ni representant le nombre d 1 occurrences dans le 
programme de type code objet intermediaire de chacune des 

30 sequences Si. 

Suite a l'etape 1000 precitee, le procede de 
compactage, objet de la presente invention, consiste a 
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soumettre, en une etape 1001, les sequences identiques 
d' instructions standard successives Si a un test de 
comparaison d'une fonction f (Ni) d'au moins le nombre 
d r occurrences N x associ6 £ une sequence identique Si. Sur la 
5 figure 2a, le test de comparaison est note : 

f (N±) > Ref. 

Lorsque la r^ponse au test 1001 est negative, la 
fonction d'au moins le nombre d ' occurrences N ± n'etant pas 
superieure A la valeur de reference, le test 1001 est 

10 applique a la sequence identique suivante, de rang i+1, par 
incrementation de l'indice i a l f etape 1002. 

Les 6tapes 1000, 1001 et 1002 representees en figure 
2a permettent ainsi de rechercher dans le programme de type 
code objet interm6diaire l f ensemble des sequences ou series 

15 d' octets identiques ou, k tout le moins, un nombre 
significatif donn6 de ces sequences identiques, ainsi qu'il 
sera decrit ulterieurement dans la description. 

Sur reponse positive au test 1001 precite, le 
proc6de de compactage, objet de la pr^sente invention, 

20 consiste ensuite & engendrer une instruction specifique, 
not6e ISi, par definition d 1 un code op<§ratoire specifique, 
note Ci, et association a ce code op6ratoire specifique de 
la sequence d' instructions standard successives ayant 
satisfait au test, la sequence d 1 instructions standard 

25 successives Si. Sur la figure 2a, 1 1 etape de creation 
d ' instructions specif iques est notee : 

ISi = Ci • Si • 

On indique que l 1 etape de definition d ' un code op^ratoire 
specifique et d 1 association £ ce dernier de la sequence 
30 d f instructions standard successives Si peut consister en 
1 f attribution d'une valeur de code et 1 1 association sous 
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forme d'une liste par exemple de cette valeur de code et de 
la sequence d T instructions Si pr6cit6e. 

Suite £ l'etape 1003, le procede de compactage 
consiste ensuite, £ l'etape 1004, a remplacer dans le 
5 programme de type code objet intermediaire memorise chaque 
occurrence de la sequence d 1 instruct ions successives 
standard Si par le code op6ratoire specif iqueCi qui lui est 
associe pour obtenir un programme de type code objet 
compacte, not6 FCC, succession d 1 instruct ions standard et de 

10 codes op6ratoires sp^cifiques Ci- 

Le processus de remplacement peut alors etre r6it6r6 
pour chaque sequence ou s6rie d' instructions standard 
identiques S± tant que l'indice i est inferieur a un nombre 
P de sequences identiques, un test 1005 de comparaison de 

15 l'indice i £ la valeur P permettant, sur reponse positive a 
ce test, le retour £ l'etape d 1 incrementation 1002 de 
l'indice i pr£c6demment decrit. 

On comprend en particulier que, suite £ I 1 iteration 
du processus de remplacement ainsi forme, on obtient un 

20 programme de type code objet compacte, note FCC, auquel est 
associe un fichier d 1 execution de ce dernier, fichier note 
FEX, ce fichier d' execution consistant au moins en une mise 
en correspondance biunivoque entre chaque code sp^cifique d 
et la sequence d ' instructions standard successives Si 

25 precit^e. 

Suite a l'obtention des deux fichiers precites, 
programme de type code objet intermediaire compacte et 
fichier d' execution, sur reponse negative au test 1005 par 
exemple, il est possible de proceder a une memorisation, 

30 dans la m^moire programmable 18a par exemple, du programme 
de type code objet intermediaire compacte obtenu FCC 
precite, et bien entendu du fichier d' execution FEX 
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precedemment mentionne. La memorisation pr£citee peut de 
mani£re non limitative etre effectu6e dans la m§moire non 
volatile 18, memoire programmable 18a ou meme memoire morte 
18b. 

5 En ce qui concerne le test de comparaison 1001 

precit6, on indique bien entendu que la fonction d'au moins 
le nombre d T occurrences de chaque sequence identique Si peut 
etre definie de fagon a obtenir une optimisation du gain de 
compactage ainsi realise. Dans un mode de realisation non 

10 limitatif, on indique que cette fonction peut etre etablie 
de fagon & realiser une comparaison de la taille de chaque 
sequence identique d 1 instructions standard successives en 
nombre d f octets a une valeur de seuil, exprim^e par exemple 
en nombre d 1 instructions standard. 

15 La figure 2b d6crit, a titre d 1 exemple illustratif, 

un mode operatoire permettant d'engendrer un programme de 
type code objet interm^diaire compacts conformement £ la 
mise en oeuvre du proc6d6, objet de la pr<§sente invention. 

Dans un premier temps, le createur du programme de 

20 type code objet interm6diaire realise un f ichier de type 
texte contenant le programme source. Ce programme etabli par 
ce dernier a partir d'un langage £volu£ est, de mani£re 
g6n6rale, 6crit en code ASCII de mani^re £ etre lu 
facilement et A pouvoir contenir des commentaires qui 

25 facilitent, d'une part, la comprehension, et d'autre part, 
la mise au point de ce dernier. Le programme source ainsi 
obtenu est introduit dans un compilateur de type classique, 
dit compilateur standard, dont le role consiste a 
transformer chaque ligne de programme en instructions 

30 ex&cutables ou, £ tout le moins, en instructions 
interpr^tables pour obtenir un programme de type code objet 
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intermediaire consistant en une suite d ' instructions 
standard interpretables par 1 1 interpreteur . 

Le fichier de type code objet intermediaire ainsi 
obtenu suite au processus de compilation est introduit dans 
5 un systeme compacteur permettant la mise en ceuvre du procede 
de cornpactage pr<§c6demment decrit en liaison avec la figure 
1. Ce systeme compacteur sera decrit ulterieurement dans la 
description. 

Le processus de cornpactage mis en oeuvre, ainsi que 

10 decrit prec^demment, permet alors l'obtention d ' un fichier 
d f instructions interpretables FCC, c'est-S-dire du fichier 
constitutif du programme de type code objet intermediaire 
compacte, et du fichier d 1 execution FEX pr§cedemment 
mentionne dans la description, 

15 Le mode operatoire du systeme de cornpactage sera 

decrit ci-apres dans un exemple specif ique de mise en oeuvre. 

En premier lieu, le systeme compacteur analyse 
toutes les instructions standard I s et dresse une liste de 
toutes les series d 1 instructions standard existant dans le 

20 fichier constitutif de ce dernier. 

Si le fichier pr^cite contient 1000 octets par 
exemple, le systdme compacteur lance une procedure de 
recherche de toutes les series d f au moins deux octets 
jusqu'£ un nombre Q par exemple. La recherche pr6citee peut 

25 etre effectu6e pour des series de deux octets, puis de trois 
octets, et ainsi de suite jusqu'a Q octets. Dans un mode de 
realisation pref 6rentiel , le nombre Q avait la valeur 500. 

Ainsi, pour chaque sequence d 1 instructions Si, 
form§e par une s£rie d ' instructions standard I s/ le systeme 

30 compacteur determine si cette sequence Si est deja dans la 
liste. Dans un tel cas, le systeme compacteur rajoute une 
unite au nombre d F occurrences Ni de la sequence S L precitee. 
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A la fin clu processus de recherche precite, le 
systeme compacteur a ainsi engendre une liste complexe 
contenant 1' ensemble des sequences d 1 instructions Si 
examinees, a chaque sequence etant associ£ un nombre 
5 d 1 occurrences Ni dans le programme de type code objet 
intermedia ire consider 6 . 

Un tableau illustratif est donne ci-apres pour un 
programme de type code objet intermediaire constitu§ par la 
serie d ' instructions ci-apr6s : 

10 1-7-3-5-7-3-7-3-5-7. 

Alors que pour 1 ' exemple illustratif donne dans le tableau, 
TABLEAU 1 ci-apr6s, la s6rie d 1 instructions pr6citee 
comporte dix instructions, chaque instruction etant 
representee par un octet et illustree par un chiffre de 1 a 

15 7, les sequences d ' instructions successives examinees 
comprennent 2, 3, 4 puis 5 octets. 

Les sequences d 1 instructions successives Si, dont le 
nombre d ■ occurrences dans le programme de type code objet 
intermediaire pr6cite est superieur ou 6gal a deux, sont 

20 donn6es dans le tableau ci-apres. 
TABLEAU 1 



4 octets 


[7-3-5-7] :2 






3 octets 


[7-3-5] :2 


[3-5-7] :2 




2 octets 


[7-3] :3 


[3-5] :2 


[5-7] :2 



En deuxieme lieu, le systeme compacteur remplace 
certaines sequences Si du TABLEAU 1 par un code 
25 d ? instructions specif iques. 

Le code d 1 instructions specif iques C± est determine 
chronologiquement & partir du premier code correspondant £ 
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une instruction standard. Dans un code intermediaire objet 
courant, il existe a ce jour 106 instructions standard et 
les codes de ces instructions sont compris entre 000 et 105. 
Le premier code d ' instructions specif iques Ci peut alors 
5 etre la valeur 106, le second la valeur 107 et ainsi de 
suite. Chaque fois que les sequences d 1 instruct ions 
identiques Si sont remplacees par un nouveau code 
d 'instructions specif iques Ci, une fois qu ' une telle 
operation est termin§e, la liste representee dans le tableau 

10 precedent est alors recalculee. 

A titre d'exemple non limitatif et dans le cas du 
remplacement de la sequence d 1 instructions de 4 octets 
representee au tableau precedent, la sequence 1-3-5-1, et 
allocation d'un code specif ique correspondant 106, le 

15 programme de type code objet intermediaire compacts 
devient : 

1-106-3-106. 

Dans ces conditions, il n' existe plus de sequence 
d 1 instructions standard I s et d 1 instructions sp^cifiques IS 

20 -se - retrouvant £ .l'identique au moins deux fois. Bien 

entendu, le fichier constitutif du programme de type code 
objet intermediaire compress^ FCC et le fichier d f execution 
ou de decompactage de ce dernier sont memorises au niveau du 
systeme compacteur pr£cit6. 

25 Aprds l'operation de compactage realis^e par le 

systeme compacteur, on dispose du programme de type code 
objet intermediaire proprement dit, executable par le 
systeme cible, et du fichier d'execution FEX pr6cite. Le 
premier precit6 contient des instructions standard I s et des 

30 instructions sp6cifiques IS, alors que le second comporte au 
moins un tableau permettant de lier .les codes specif iques Ci 
avec les series d 1 instructions standard Si remplacees par 
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les codes specif iques precites. Bien entendu, ces deux 
fichiers peuvent etre regroupes en un seul et meme fichier 
en vue du transfert de ce dernier au systeme cible 
destinataire, c'est-a-dire a la carte a microprocesseur 
destinee a recevoir ce dernier. 

En ce qui concerne le fichier d T execution FEX, on 
indique que celui-ci comporte au moins un fichier, note 
MEM-SEQ, constitue par une succession de plusieurs champs 
tels qu 1 un champ de code specif ique C if un champ de sequence 
Si, tei que mentionne pr6c£demment . 

Suite a 1 ! operation precitee, le fichier unique ou, 
le cas echeant, les deux fichiers pr6cit6s, sont transmis au 
systeme cible et directement trait^s par un programme de 
chargement. Ce programme de chargement est principalement 
charge d'ecrire en memoire programmable 18a ou en memoire 
morte 18b les donnees regues en vue d'une bonne execution 
par la suite. 

A titre d'exemple non limitatif, on indique que le 
fichier relatif au programme de type code objet 
intermediaire compacte FCC est stocke sans traitement a 
partir d'une adresse determinee, notee ADR-MEM-PGM, dans la 
memoire programmable 18a pr6citee. 

En ce qui concerne le fichier d' execution FEX, on 
indique que vis-A-vis de ce dernier, le programme de 
chargement analyse les donnees de ce fichier et cr66 
dynamiquement un tableau, note TAB-PRO, permettant 
d'associer les codes d 1 instruct ions specif iques Ci avec les 
series d 1 instruct ions . En fait, le tableau TAB-PRO permet 
d f assurer une correspondance biunivoque entre les codes 
d 1 instructions specif iques pr6cit6s C± et une adresse 
d ' implantation permettant l f execution des instructions 
correspondantes . 
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Une implantation, d' une part, du fichier support du 
programme de type code objet intermediaire compacte FCC, du 
fichier d'execution FEX et du fichier TAB-PRO precedemment 
cite, ce dernier fichier ayant ete engendre par le programme 
5 de chargement dans la memoire programmable 18a de la carte a 
microprocesseur, est representee en figure 2c. 

Sur cette figure, alors que le tableau des codes 
d 1 instructions standard I s est memorise au niveau de 
1 ' interpreteur en un tableau TAB-STD, le fichier d'execution 
.10 FEX et le fichier TAB-PRO permettant d' assurer la 
correspondance des sauts d'adresse avec les codes 
d' instructions specifiques Ci, ces deux tableaux permettant 
l 1 execution effective au niveau du microprocesseur de 
l f unite cible du programme de type code objet intermediaire 

15 compacte FCC, sont au contraire memorises dans la memoire 
programmable 18a. On dispose ainsi d 1 un ensemble executable 
par 1 1 intermediaire de 1 1 interpreteur dans les conditions 
qui seront d£crites ci-apr£s. 

Pr£alablement £ la description de 1' execution d'un 

20 programme de type code objet intermediaire compacte FCC, une 
description detaillee de la structure des fichiers 
d'execution FEX et du fichier TAB-PRO et de la relation 
f onctionnelle entre ces derniers sera maintenant donnee en 
liaison avec les figures 3a et 3b. 

25 Sur la figure 3a, on a represents le fichier 

d'execution FEX de mani£re detaillee, celui-ci comportant 
ainsi que mentionne precedemment, outre les champs de codes 
specif iques Ci et de sequences d 1 instruct ions Si, un champ 
de fin de macro-instructions, note FM, indiquant en fait la 

30 fin de la sequence precitee. Dans un mode de realisation non 
limitatif, chaque code specifique C ± peut etre inscrit au 
debut du champ, sur un octet par exemple, puis chaque 



2785695 



16 

sequence correspondante Si est inscrite dans un second champ 
de longueur variable. Le code de fin de macro FM est de type 
standard et correspond h celui utilise par le langage 
classique pr£cedemment indique dans la description. 
5 Lors de la reception du fichier d' execution FEX dont 

la structure de donnees correspond a celle representee en 
figure 3a par exemple, les dif ferents champs Ci, Si et FM 
sont traites separement. 

En premier lieu, le code specifique Ci de 

10 1 ' instruction specifique IS correspondante est ecrit dans le 
fichier TAB-PRO et la sequence d' instructions Si associee £ 
ce code specifique constitutive de 1 1 instruct ion specifique 
precitee est 6crite dans un fichier ou memoire referencee 
MEM-SEQ £ partir d'une adresse notee ADR-1. Le code Ci de 

15 1 1 instruction specifique correspondante est ecrit a 
1' adresse TAB-PRO + 3 x (CODE-106) . Dans cette relation, on 
indique que 1' adresse TAB-PRO est 1' adresse d'ouverture du 
fichier TAB-PRO, alors que la valeur CODE represente la 
valeur num6rique du code Ci correspondant. Sur la figure 3b, 

20 on a represente le mode opferatoire correspondant pour une 
valeur d' adresse TAB-PRO egale arbitrairement a 0, le 
premier code specifique alloue ayant la valeur 106 et les 
autres codes spfecifiques alloues successifs ayant des 
valeurs 107 et suivantes. On indique que sur la figure 3b 

25 seuls quatre codes sp6cifiques 106, 107, 110 et 120 ont ete 
representes pour une meilleure comprehension, les autres 
espaces memoire etant remplis par des valeurs arbitraires. 

Dans ces conditions, Adr-i est la premiere adresse 
disponible dans la memoire MEM-SEQ, cette adresse 

30 correspondant a l f adresse Adr-1 pour la premiere sequence 
d 1 instructions Si » Si. A partir de cette premiere adresse, 
laquelle constitue l 1 adresse d'ouverture du fichier dans la 
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memoire MEM-SEQ, les sequences d ■ instructions S± sont ainsi 
ecrites de fagon sequentielle dans 1 1 ordre de leur 
chargement. Le code FM de fin de macro est egalement ecrit a 
la fin de la serie correspondante. 
5 A la suite de l'ecriture precitee dans la memoire 

MEM-SEQ et apres une 6tape de verification correcte 
d'ecriture, le programme de chargement ecrit dans le tableau 
TAB-PRO a la suite de chaque code specif ique Ci la valeur de 
l 1 adresse d'6criture de la sequence dans la memoire MEM-SEQ. 

10 Le programme de chargement recalcule alors une nouvelle 
adresse d'ecriture pour la prochaine sequence Si de rang i 
incr^mente ou decrements en fonction du mode de parcours des 
sequences d 1 instructions Si precitees. 

Un processus d 1 execution d 1 un programme de type code 

15 objet intermediate compacts supporte par un fichier FCC 
pr6c6demment decrit et contenant des instructions 
specif iques sera maintenant decrit en reference a la figure 
4 . 

L • execution d'un tel programme s'effectue par 
20 1 1 intermediate de 1 1 interpreteur a I'aide d'un pointeur 
d' instruction, note PI. En fait, le pointeur d 1 instruct ion 
PI nt le code de 1 ' instruction £ executer, instruction 
standard Is ou instruction specifique IS, et pr6sente ce 
code a 1 'interpreteur qui declenche ensuite les actions 
25 correspondant a ce dernier. 

Au debut de l'execution d'un programme, le pointeur 
d 1 instruction PI est chargfe avec 1' adresse de debut de ce 
programme, c'est-£-dire l'adresse ADR-MEM-PGM. 

L 1 interpr6teur analyse la valeur du code lu par le 
30 pointeur d' instruction PI. Dans le cadre de cette analyse, 
ce dernier determine si cette valeur de code correspond a un 
code de type standard Cs ou au contraire & un code de type 
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specifique Ci. Cette operation est realisee £ partir du 
tableau TAB-STB memorise au niveau de 1 ' interpreteur et 
associant les codes d' instruct ions . standard, et done les 
instructions standard Is, avec les adresses d' execution dans 
5 son programme. 

Si la valeur du code lu n'est pas dans ce dernier 
tableau, 1 ' interpreteur provoque un appel en lecture dans le 
tableau TAB-PRO afin de verifier 1' existence de la valeur du 
code lu dans ce dernier tableau. Si le code lu n'est pas non 

10 plus dans ce dernier tableau, 1 1 interpreteur est incapable 
d'ex6cuter 1 1 instruction lue et l f execution du programme 
s'arrete en indiquant un message d'erreur, non decrit dans 
1 T organigrarrane de la figure 4. 

Sur la figure 4 pr6citee, on a represents par 2000 

15 le debut de 1' operation d' execution, 2001 1' operation 
d f initialisation du pointeur d ' instruction PI a la premiere 
instruction du programme et 2002 une operation de lecture de 
1 " instruction pointee par le pointeur d 1 instruction PI. 
Cette operation correspond en fait a la lecture de la valeur 

20 de code pr6cit§e. 

De la meme maniSre, a l'etape 2003 de la figure 4, 
1 1 appartenance ou la non-appartenance de la valeur de code 
lu au tableau des codes standard TAB-STB et 1 ■ appartenance 
de cette valeur de code lu au tableau TAB-PRO permet en fait 

25 de constituer le test 2003 prScite, 1 1 instruction lue INS 
etant ainsi discriminee en quality d f instruction standard Is 
ou instruction specifique IS. La situation d' absence 
d f appartenance du code, lu et de 1 1 instruction lue 
correspondante k 1 1 un et l r autre des deux tableaux 

30 g£neratrice d'un message d'erreur n'est pas representee en 
figure 4, afin de ne pas surcharger le dessin. 
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Si,, sur reponse positive au test 2003 precite, le 
code lu correspond a une instruction specif ique, la valeur 
du pointeur d 1 instruct ion PI, pour lire 1 1 instruct ion 
suivante,. est calcuiee et memoris6e dans la pile. 
5 L' interpreteur lit dans le tableau TAB-PRO la valeur de 
l'adresse de la sequence, d 1 instruct ions Si associee au code 
specif ique C± lu et initialise la valeur du pointeur 
d 1 instruction PI avec cette valeur. L'ensemble de ces 
operations porte la reference 2004 sur la figure 4 precitee. 

10 A la suite de l'etape 2004 precitee, 1 ' interpreteur boucle 
de nouveau £ l'etape lecture du code, ainsi que represente 
en figure 4, par retour k l'etape 2002. 

Si, sur reponse negative au test 2003, le code lu 
correspond a une instruction de type standard Is, 

15 1 ' interpreteur controle dans une etape de test 2005 si la 
valeur de ce code correspond a une valeur de fin de macro 
representant en fait une fin de sequence. Si tel est le cas, 
la valeur pr£c£demment m£moris6e dans la memoire de pile est 
extraite et la pile est mise a jour, cette valeur etant 

20 - -chargee dans le pointeur d 1 instruct ion PI. L'operation 
d' extraction de la pile de la valeur prec6demment m6moris6e 
constituant une adresse de retour puis de remise a jour de 
la pile, est representee en 2006, l'adresse de retour etant 
notee ADR-RET. Suite k l'etape 2006 precitee, 1 1 interpreteur 

25 boucle de nouveau le processus a l'etape de lecture de la 
valeur de code, c'est-^-dire a l'etape 2002. Si, sur reponse 
negative au test 2005, la valeur du code lu correspondant k 
une instruction de type standard ne correspond toutefois pas 
a une fin de macro ou fin de serie, alors, le code est 

30 execute de fa^on connue en tant que telle par 
1 ' interpreteur . Ainsi qu'on l'a toutefois represente en 
figure 4, une etape de test 2007 est prevue dans ce cas 
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prealablement a 1' execution proprement dite de 1 1 instruction 
standard precitee. Le test 2007 consiste k verifier que la 
valeur du code et 1 ' instruction INS correspondante ne 
correspond pas a celle d'une fin de programme. Sur reponse 
5 positive au test 2007 pr^cite, l'etape d 1 execution 2008 de 
cette instruction par 1 1 interpreteur est alors realisee, a 
cette etape d'execution etant associee une etape 
d 1 incrementation du pointeur d 1 instruct ion vers 
1 ' instruction suivante. Suite a I 1 etape 2008 precitee, 

10 1' interpreteur reboucle vers 1 1 etape de lecture de la valeur 
de code pointee par le pointeur d' instruction PI, c'est-a- 
dire l f etape de lecture 2002. 

Sur reponse negative au test 2007, 1 ? instruction 
correspondant £ une instruction de fin de programme, une 

15 etape de fin 2009 est r6alis6e. L' interpreteur dans ce cas 
arrete son action et donne la main au systeme d ' exploitation 
OS. Celui-ci attend alors une nouvelle instruction de 
commande . 

Le mode de realisation et de mise en oeuvre du 
20 processus d 1 execution d'un programme de type code objet 
interm6diaire compacte, tel que decrit precedemment en 
liaison avec la figure 4, n'est pas limitatif. 

En premier lieu, on indique que la memoire de pile 
peut etre subdivis^e en deux memoires de pile s6par6es, une 
25 memoire de pile pour les instructions standard Is, et une 
m6moire de pile pour les instructions specifiques IS encore 
designees par macro-instructions. Dans un tel mode de 
realisation, on connait le nombre maximal d ■ imbrications 
d 1 instructions specifiques IS intraproceduralement . Pour 
30 avoir la taille totale occup6e par cette pile, il suffit de 
multiplier par le nombre maximal de procedures imbriqu^es. 
La mise en oeuvre d'une memoire de pile separ6e pour les 
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instructions specif iques IS procure, par rapport £ 
1 1 utilisation d'une seule pile, une reduction de la 
consommat ion totale de m^moire. 

En outre, afin d'augmenter le nombre d 1 instruct ions 
5 sp6cifiques IS utilisables en lieu et place du nombre 
d 1 instruct ions specif iques limite ' entre 106 et 255 dans 
l'exemple precedemment donne dans la description, les codes 
sp^cifiques Ci peuvent avantageusement etre codes sur deux 
octets. Dans ces conditions, une valeur de code 

10 particuli^re, telle que la valeur 255, peut alors indiquer 
le codage sur deux octets. 

Enfin, le syst^me cible, lorsque ce dernier est 
constitue par un syst£me embarque multi-applications, 
comprend plusieurs programmes compiles et compacts, c r est- 

15 £-dire plusieurs fichiers FCC precedemment decrits dans la 
description. Ces programmes doivent fonctionner de maniere 
independante . Dans un tel cas, 1 f interpr6teur etant unique, 
il execute tous les programmes d ■ applications charges par le 
programme de chargement. Si deux programmes d 1 applications 

20 ut.ilisent^.des instructions specif iques, dans le mode de 
realisation precedemment d£crit dans la description, il est 
possible que le systeme compacteur affecte le meme code 
specifique Ci pour deux series d 1 instructions differentes. 

Afin de remedier £ une telle situation et pour 

25 permettre a 1 1 interpreteur de distinguer les deux codes, les 
champs du fichier d 1 execution FEX tels que representes 
precedemment en figure 3a peuvent etre completes par un 
troisieme parametre relatif a un numero d 1 identification de 
1' application consideree. Ce num6ro d 1 identification est 

30 alors memorise 6galement pour chaque code specifique affecte 
dans le tableau TAB-PRO. Ce dernier parametre constitue en 
fait la reference du programme charge en meme temps que le 
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fichier contenant le tableau permettant d'associer chaque 
code d f instruction spdcifique Ci avec des sequences 
d 1 instructions Si remplac6es par ces derniers pour 
1 'application consider^e. Lors de 1' execution de 
5 1 1 application du programme par 1 ' interpreteur , ce dernier 
peut ainsi assurer la discrimination des instructions 
specif iques relatives a cette application. 

Bien entendu, le processus precedemment decrit 
permettant la mise en ceuvre d'un systeme embarque multi- 

10 applications pr<§sente 1 f inconvenient d'une consommation 
accrue de memoire, du fait de 1 f attribution d'un champ 
suppl^mentaire relatif au num6ro d ' application consideree. 

Un processus plus avantageux sera maintenant decrit 
en liaison avec la figure 5. 

15 Relativement £ la figure 5, on considere un systeme 

embarque tel qu'une carte & microprocesseur comportant 
plusieurs applications, notees Ai a A k , les valeurs Ai a A k 
constituant en fait des num6ros d ' identification de chaque 
application. Dans ce but, lors du compactage, conformement 

20 au proc6de objet de la pr6sente invention tel que decrit 
pr6c6demment dans la description, de tout programme ou 
application source de num6ro d 1 identification donne & Ai £ 
A k -i par exemple, le systeme cible, c f est-£-dire carte a 
microprocesseur, transmet au compacteur le contenu de la 

25 memoire MEM-SEQ avec bien entendu les codes specif iques Ci 
correspondants. En fait, le systeme cible recalcule a partir 
du fichier ou tableau TAB-PRO et du contenu de la memoire 
MEM-SEQ un fichier des coefficients specifiques anterieurs, 
note F-C-ANT, relatif aux applications A x a A k . L . Le fichier 

30 F-C-ANT assure la mise en correspondance biunivoque de 
chaque code specifique Ci et de la sequence Si associ^e a ce 
dernier pour 1' ensemble des applications A x a A k -i. Dans ces 
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conditions et dans un mode de realisation non limitatif 
simplifie, le fichier F-C-ANT peut consister en un fichier 
de meme format que le fichier FEX precite. Dans le processus 
de compactage preferential tel que represents en figure 5, 
5 le fichier F-C-ANT des codes specifiques anterieurs est 
alors communique au compacteur afin d 1 assurer un 
apprentissage de ce dernier. 

Lors du compactage d'une nouvelle application, de 
numero d T identification A k , le compacteur recherche toutes 

10 les occurrences des sequences d ' instruct ions Si deja 
enregistrees dans le fichier F-C-ANT, c'est-a-dire en fait 
dans le tableau TAB-PRO du systeme cible pour les 
applications anterieures Ai d A k -i . A chaque occurrence 
trouvee, le systeme compacteur remplace la sequence 

15 d' instructions correspondante Si par le code sp6cifique C L 
de 1 1 instruction specif ique IS correspondante. Cette 
operation 6tant effectu6e, le systeme compacteur peut alors 
analyser 1 1 application de code d ' identification A* et bien 
entendu rechercher d'autres occurrences en vue de creer des 

20 instructions specifiques suppl£mentaires qui n'ont pas 
encore 6t6 m£moris6es. Une mise £ jour du fichier F-C-ANT 
peut alors etre effectu£e. Le processus de decompactage 
decrit en liaison avec la figure 5 peut etre mis en ceuvre de 
mani£re particulierement avantageuse pour assurer le 

25 compactage, soit de programmes charges pour la premiere fois 
dans le systeme embarqu6, soit de programmes charges en 
supplement a d'autres programmes compactfes existants dans le 
systeme embarque. 

Dans les deux hypotheses precitees, le procede de 
30 compactage, objet de l 1 invention, consiste a memoriser la 
table d 1 execution relative a au moins un programme 
intermediaire de type code objet compacte, le premier de ces 
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programmes dans la premiere hypothese et un ou plusieurs 
programmes compactes existants dans la deuxieme hypothese, 
puis pour tout programme intermediaire suppiementaire, a 
lire la table d 1 execution memorisee et a effectuer le 
compactage de tout programme suppiementaire, en tenant 
compte des instructions et codes specifiques memorises dans 
la table d 1 execution, ainsi que decrit precedemment dans la 
description. Bien entendu, le programme de type code objet 
intermediaire compacte ainsi cree ne peut alors etre execute 
que sur le systeme cible qui a fourni precedemment au 
systeme compacteur le fichier F-C-ANT pertinent 
correspondant 

Dans le cadre de la mise en. ceuvre du proc^de de 
compactage d'un programme de type code objet intermediaire, 
tout systeme embarque, tel qu'un objet portatif multi- 
applications form£ par exemple par une carte a 
microprocesseur et comportant des ressources de calcul tel 
qu'un microprocesseur, une mdmoire programmable, une memoire 
morte et un interpreteur de langage, comprend, en reference 
avec la figure 2c precedemment introduite dans la 
description, au moins, outre le tableau TAB-STD des codes 
standard constitutifs d'un programme de type code objet 
intermediaire memorise au niveau de l'interpreteur, un 
ensemble de fichiers memorises dans la memoire programmable 
18a par exemple. 

Ainsi, 1' objet portatif correspondant comprend au 
moins un programme de type code objet intermediaire 
compacte, c'est-a-dire le fichier FCC represents en figure 
2c. Ce fichier peut etre constitutif d' une application telle 
que rnentionnee precedemment, soit d'une fonction telle 
qu'une fonction de chif f rement /dechif f rement de donnees ou 
analogue. Ce fichier de type code objet intermediaire 
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compacte consiste bien entendu en une suite de codes 
d 1 instructions specif iques Ci et de codes d ' instructions 
standard correspondant aux codes d ' instructions du programme 
de type code objet intermSdiaire precite. Les codes 
d 1 instructions specif iques Ci correspondent a des sequences 
d 1 instructions standard successives Si precedemment 
mentionnees dans la description. 

En outre, ainsi que represents sur la figure 2c 
precitSe, une table d 1 execution permet la mise en 
correspondance biunivoque entre chaque code opSratoire 
specif ique Ci et la sequence d 1 instruct ions standard 
successives Si associ^e £ ce dernier. L'ensemble de ces 
fichiers permet d'optimiser l'espace memoire occupe dans la 
memoire, notamment la mSmoire programmable 18a, de 1' objet 
portatif. 

Ainsi que represents d'ailleurs en figure 2c, la 
table d r execution comprend au moins un fichier des sequences 
successtves correspondant aux instructions spScifiques, 
fichier design^ par la memoire MEM-SEQ, et un tableau, 
d6sign6 par TAB-PRO des codes d 1 instructions specif iques et 
des adresses d 1 implantation de ces instructions spScif iques 
dans le fichier des sequences successives. 

L 1 execution du programme de type code objet 
intermSdiaire compacte est alors rSalisee, ainsi que 
represents en figure 4. 

Un systeme de compactage d'un programme de type code 
objet intermediaire permettant la mise en ceuvre du procede 
de compactage precedemment dScrit dans la description sera 
maintenant donnS en liaison avec les figures 6a et 6b. 

D'une maniSre gSnSrale, le systeme de compactage, 
objet de la prSsente invention, sera decrit comme une 
combinaison de modules, ces modules pouvant etre mis en 



2785695 



26 

ceuvre, soit de maniere mat^rielle, soit, pref erent iellement , 
de maniere logicielle, les flux de donnees entre ces modules 
etant representes. 

Ainsi, sur la figure 6a, on a represents le systeme 
5 de compactage, objet de la presente invention, lequel est 
r£put6 comprendre au moins un module A d* analyse de toutes 
les instructions directernent executables, constitutives du 
programme de type code objet intermediaire, note 
COD-OBJ-INT, D'une manidre gen6rale, le fichier inf ormatique 

10 support du programme de type code objet intermediaire 
precite est considere comme une chaine d' octets, ou chaine 
de caracteres, et le mode operatoire du systeme de 
compactage, objet de la presente invention, sera donne dans 
une optique de traitement de chaine correspondant . 

15 A partir de la chaine d' octets pr6citee, le module 

d 1 analyse A permet, par lecture du programme de type code 
objet COD-OBJ-INT, de discriminer et etablir une liste de 
toutes les sequences d' instructions standard Si contenues 
dans le programme pr6cit6. Sur la figure 6a, les sequences 

20 d 1 instructions standard Si, Si-i, S if S i+ i, ... S p , sont ainsi 
notees sous forme symbolique d'une liste selon la notation 
symbolique des listes. On comprend ainsi que le module 
d'analyse A peut consister en une fenetre glissante 
correspondant a un nombre ni d 1 octets, cette fenetre 

25 glissante permettant d' assurer l 1 analyse des sequences Si 
ainsi que precedemment mentionnSes en reference avec le 
tableau 1 de la description. La fenetre glissante assure en 
fait une discrimination de chaque sequence S± par defilement 
relatif de la chaine d' octets vis-^-vis de la fenetre 

30 precitee. A chaque occurrence de la sequence Si consider6e, 
un bit de comptage BC est delivre par le module d'analyse A. 
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Ainsi que represents en outre en figure 6a, le 
systeme de compactage, objet de la presente invention, 
comprend un module C de comptage du nombre d 1 occurrences 
dans le programme de type code objet precite de chacune des 
5 sequences d ' instructions directement executables Si 
precedemment mentionnees. Le module de comptage C peut etre 
realise par un module logiciel, lequel compte le nombre de 
bits successifs a la valeur 1 du bit de comptage BC precite. 
Le module de comptage C permet de memoriser les nombres 

10 d' f occurrences Ni ... Ni-i, N if N in ... N p de chaque sequence Si ... 
Si-i a S i+ i ... S p correspondante et suivante. Cette 
memorisation peut etre effectuee sous forme d'une liste. 

En outre, ainsi que repr£sente sur la figure 6a, un 
module AL d' allocation a au moins une sequence 

15 d' instructions directement executables Si d'un code 
specifique Ci associe a cette sequence Si est prevu pour 
engendrer une instruction specifique, notee ISi sur la 
figure 6a, sur critfere de sup6riorite de la fonction d f au 
moins le nombre N ± d 1 occurrences correspondant vis-a-vis 

20 d'une valeur de reference ainsi que mentionne precedemment 
dans la description . 

Dans le cas ou la fonction d'au moins le nombre Ni 
est superieure & la valeur de la fonction de la valeur de 
reference pr£cit£e, le module AL delivre une commande de 

25 compactage COM-COMP, lequel peut consister en un bit a la 
valeur 1 ou 0 correspondante. 

Enfin, le systeme de compactage, objet de la 
presente invention, comprend un module de compactage 
proprement dit COMP, lequel regoit, d'une part, le fichier 

30 relatif au programme de type code objet intermediate 
precite COD-OBJ-INT et la commande de comptage COM-COMP . Le 
module de compactage proprement dit COMP permet en fait 
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d' assurer le remplacement dans le programme de type code 
objet precitS, considere comme une chaine d' octets, de 
chaque occurrence de toute sequence Si correspondant a une 
instruction specifique ISi par le code specifique d associe 
5 a cette sequence d' instructions. 

En ce qui concerne le mode operatoire du module de 
compactage COMP proprement dit, on indique que celui-ci peut 
comprendre un sous-module de lecture par fenetre glissante 
analogue a celui du module d' analyse, permettant de 

10 localiser la sequence d ' instructions standard Si dans la 
chaine d' octets precitee. En pratique, sur localisation de 
la sequence d ' instructions standard Si precitee, ainsi que 
represents de maniere illustrative en figure 6a, le module 
de compactage peut comprendre un sous-module de partition a 

15 gauche et de partition a droite de la sequence S ± 
consideree, pour engendrer une chaine gauche, notee LS, et 
une chaine droite, notee RS. II peut comporter ensuite, a 
partir du code specifique C t constitutif de 1 1 instruction 
specifique ISi, un module de concatenation permettant, d'une 

20 part, la concatenation du code specifique Ci correspondant, 
considere comme une chaine d' octets, a la chaine gauche LS 
par exemple, puis concatenation de 1' ensemble ainsi formS a 
la chaine droite RS, ce qui permet d' assurer le remplacement 
de la sequence Si par le code specifique C t . Le module de 

25 compactage proprement dit COMP delivre ainsi un programme de 
type code objet intermediaire compacte, note sur la figure 
6a, COD-OB J-INT-COMP. Bien entendu, le systeme de compactage 
represents en figure 6a permet 1 ' application du processus de 
compactage precedemraent decrit a 1' ensemble de toutes les 

30 sequences d ' instructions directement executables Si 
considSrees. 
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En ce qui concerne le module d 1 allocation AL, dans 
un mode de realisation non limitatif, on indique que celui- 
ci, ainsi que repr6sent6 en figure 6b, peut comporter un 
module de calcul en nombre d' octets de la longueur ni de la 
5 sequence d 1 instructions Si, ce module etant designe par ALi 
sur la figure _6b. II peut comporter egalement un module de 
calcul, note AL 2 , du . produit de cette longueur ni et du 
nombre d 1 occurrences Ni de cette sequence Si d 1 instructions 
standard, Ce produit, note Pi, est representatif du gain de 

10 compactage pour la sequence d ' instructions directement 
executables S± consider6e. 

En outre, le module d' allocation AL peut comprendre 
un module de comparaison, note AL 3 , de ce produit Pi a une 
valeur de seuil, notee S, determinee. La valeur du seuil S 

15 peut etre determinee experimentalement . Elle peut egalement 
etre etablie £ partir de cette valeur exp6r imentale pour 
correspondre, pour un programme de type code objet 
intermediaire de longueur donnee, a un pourcentage donne de 
cette longueur. 

20 Sur r&p.onse -negative au test de comparaison effectu6 

par le module AL 3 , le rang i de chaque sequence 
d 1 instructions directement executables Si est increments 
d'une unite et la nouvelle valeur de i est renvoySe au 
module d' analyse A, d'une part, et au module de comptage C, 

25 d'autre part. 

Sur reponse positive au test de comparaison realise 
par le module AL 3 , un module AL 4 permet d'etablir un code 
specifique Ci correspondant et, enfin, un module AL 5 permet 
d T assurer en correspondance biunivoque. l'ecriture du code 

30 specifique Ci et de la sequence Si consideree d 1 instructions 
directement executables pour constituer 1 1 instruction 
specifique ISi- 
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En ce qui concerne le module AL 4 , on indique que 
celui-ci peut etre r<§alis6 par un module logiciel de 
comptage permettant, a partir d'une valeur de depart, par 
exemple la valeur 106 precedemment mentionnee dans la 
description, d'allouer une valeur correspondante pour la 
sequence d 1 instructions Si consideree. Chaque instruction 
sp£cifique ISi peut alors etre ecrite sous forme d'une liste 
correspondante, 

Des essais en temps reel de compactage de programmes 
ou applications contenus dans des cartes £ microprocesseur 
commercialisms par la societe BULL CP8 en France ont montr£ 
un gain de compactage superieur a 33%, ce qui permet en 
fait, lors d'une application du processus de compactage £ un 
nombre 6gal a trois applications pour un objet portatif 
mobile, de gagner sensiblement une application 
supplementaire pour ce type d'objet. 

Un tel gain de compactage a ete obtenu dans des 
conditions sensiblement normales d 'utilisation par 
1 f utilisateur, alors que le ralent issement introduit par 
l'appel de macro-instructions, ce ralentissement etant 
inherent a 1 1 appel successif en lecture au niveau du tableau 
TAB-B-PRO et du fichier MEM-SEQ, n'excede pas sensiblement 
10% du temps d' execution en l 1 absence de macro-instructions. 
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REVINDICATIONS 

1. Proc6de de compactage d'un programme 
intermediaire consistant en une suite d 'instruct ions 
standard, utilise dans un systeme embarque, ce systeme 
5 embarque etant dote d'une memoire et d'un interpreteur de 
langage du programme intermediaire en instructions d'un code 
objet directement ex^cutables par. un microprocesseur, 
procede suivant lequel : 

a) on recherche dans le programme intermediaire des 
10 sequences identiques d 1 instruct ions standard 

successives ; 

b) on soumet les sequences identiques d ' instructions 
successives a un test de comparaison de superiorite d'une 
fonction d'au moins le nombre d 1 occurrences de ces 

15 sequences dans ledit programme intermediaire £ une valeur 

de reference et, sur r6ponse positive audit test, pour 
chaque sequence identique d 1 instructions standard 
successives satisfaisant & ladite etape de test, 

c) on engendre une instruction specifique par definition 
20 d'un code operatoire specifique et association a ce code 

operatoire specifique de ladite sequence d ' instructions 
standard successives ayant satisfait audit test ; 

d) on remplace dans ledit programme intermediaire chaque 
occurrence de chaque sequence d 1 instructions successives. 

25 par ledit code operatoire specifique qui lui est associe 

pour obtenir un programme intermediaire compacte, 
consistant en une succession d' instructions standard et 
de codes operatoires specif iques, et 

e) on memorise dans ladite memoire une table d' execution . 
30 permettant la mise en correspondance biunivoque entre 

chaque code operatoire specifique introduit et la 
sequence d 1 instructions successives associ6e a ce 
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dernier, ce qui permet d'optimiser l'espace memoire 
occupe par ledit programme intermediaire compacte par 
memorisation dans ladite memoire d'une seule occurrence 
desdites sequences identiques d 1 instruct ions successives. 

2 . Proc6de selon la revendication 1, caract^rise en 
ce que ladite fonction est en outre fonction de la taille de 
chaque sequence identique d 1 instructions successives. 

3. Proc6de selon la revendication 1, caract6ris6 en 
ce que pour la mise en ceuvre d'un compactage d T une pluralite 
de programmes interm^diaires, ledit procede consiste en 
outre : 

- k memoriser la table d f execution relative £ au 
moins un programme intermediaire compacte, et pour tout 
programme intermediaire supplementaire soumis a un processus 
de compactage ; 

- a lire ladite table d f execution menioris^e, et 

a effectuer le compactage de tout programme 
supplementaire, compte tenu des instructions et codes 
specif iques memorises dans cette table d' execution. 

4. Procede d 1 execution d'un programme intermediaire 
compacts obtenu par la mise en oeuvre du proc£d6 de 
compactage selon la revendication 1, et consistant en une 
succession d f instructions standard et de codes operatoires 
specif iques memorises dans la memoire d'un systeme embarque, 
caracterise en ce qu'il consiste : 

- a reconnaitre dans ladite memoire l 1 existence 
d'une table d 1 execution m£moris§e comportant au moins une 
sequence d 1 instructions successives associ£e a un code 
op£ratoire specifique en correspondance biunivoque ; 

- a appeler, par 1 ' intermediaire de 1 1 interpr£teur , 
une commande de lecture des instructions standard ou codes 
operatoires sp£cifiques successifs du programme 
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intermediaire compacte et, en presence d 1 un code operatoire 
specifique : 

□ appeler par instruction de lecture dans la memoire ladite 
sequence d 1 instructions successives associ§e audit code 

5 operatoire specif ique et, en presence d'une instruction 

standard, 

□ appeler par instruction de lecture 1' execution de cette 
instruction. 

.5. Proc6de selon la revendication 4, caracterise en 

10 ce que lorsqu'une. sequence d 1 instructions successives 
associ^e a un code operatoire specifique est appelee, la 
valeur courante d'un compteur de programme est incrementee 
dans une pile associee aux codes operatoires specif iques, et 
un pointeur de programme poirite vers la premiere instruction 

15 de ladite sequence d ' instructions specifique, puis, sur 
execution d'une instruction de fin de sequence 
d' instructions specif iques, ledit compteur de programme est 
decr6mente, et 1' execution se poursuit a partir de 
1 1 instruction ou du code operatoire specifique suivant. 

20 6.- -P-r-ocede- selon- -la revendication- 5, caracterise en 

ce que la pile associee aux codes operatoires specifiques et 
la pile associee aux instructions standard sont constitutes 
par une pile unique. 

7. Systeme embarque multi-applications comprenant 

25 des ressources de calcul, une memoire et un interpreteur de 
langage d'un programme intermediaire en instructions 
directement executables par ces ressources de calcul, 
caracterise en ce que ledit systeme embarque multi- 
applications comporte au moins, outre un tableau des codes 

30 standard constitutifs dudit programme intermediaire memorise 
au niveau dudit interpreteur : 
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au moins un programme intermediaire compacte, 
constitutif d'une application et consistant en une suite de 
codes d' instructions specifiques et de codes d 1 instruct ions 
standard, lesdits codes d ' instructions specifiques 
correspondant a des sequences d 1 instruct ions standard 
successives . ; 

- une table d 1 execution permettant la mise en 
correspondance biunivoque entre code operatoire specif ique 
et la sequence d 1 instructions standard successives associ^e 
a ce dernier, ledit au moins un programme intermediaire 
compacte et ladite table d T execution etant memorises dans 
ladite memoire, ce qui permet d 1 optimiser 1 f espace memoire 
occupe par ledit programme intermediaire compacte par 
memorisation dans ladite memoire programmable d'une seule 
occurrence desdites sequences identiques d 1 instructions 
successives, 

8. Systeme embarqu6 selon la revendication 7, 
caracteris6 en ce que ladite table d' execution comprend au 
moins : 

- un fichier des sequences successives correspondant 
aux instructions specif iques ; 

- un tableau des codes d 1 instructions specifiques et 
des adresses d 1 implantation de ces instructions specifiques 
dans la table des sequences successives. 

9. Systeme embarque selon la revendication 8, 
caracterise en ce que ledit . fichier des sequences 
successives correspondant aux instructions specifiques et 
ledit tableau des codes d 1 instructions specifiques sont 
memorises en memoire programmable dudit systeme embarqu6. 

10. Systeme de compact age d'un programme 
intermediaire, ce programme intermediaire consistant en une 
s6rie d 1 instructions standard ex^cutables par une unite 
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cible, caracterise en ce que ledit systeme comprend au 
moins : 

- des moyens d'analyse de toutes les instructions 
standard executables permettant par lecture dudit programme 

5 intermediaire de discriminer et etablir une liste de toutes 
les sequences d ' instructions standard executables contenues 
dans ce programme intermediaire ; 

- des moyens de comptage du nombre d' occurrences, 
dans ce programme intermediaire, de chacune des sequences 

10 d' instructions standard executables membre de cette liste ; 

- des moyens d'allocation a au moins une sequence 
d' instructions standard executables d'un code specif ique 
associe a cette sequence d' instructions standard executables 
pour engendrer une instruction specifique ; 

15 - des moyens de remplacement dans le programme de 

chaque occurrence de cette sequence d' instructions standard 
executables par le code specifique associe a cette sequence 
d' instructions standard executables, representatif de ladite 
instruction specifique, ce qui permet d" engendrer un 

20 programme compacte, -comprenant une succession d ' instruct ions 
standard executables et d ' instructions specif iques. 

11. Systeme selon la revendication 10, caracterise 
en ce que lesdits moyens d'allocation a au moins une 
sequence d 1 instruct ions standard executables d'un code 

25 specifique associe a cette sequence d 1 instructions standard 
executables pour engendrer une instruction specifique 
comportent au moins : 

- des moyens de calcul de la valeur d'une fonction 
d'au moins la longueur et du nombre d' occurrences de cette 

30 sequence d ' instructions standard executables, ladite 
fonction etant representative du gain de compactage pour 
cette sequence d ' instructions standard executables ; 
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- des moyens de comparaison de la valeur de cette 
fonction a une valeur de seuil, et, sur reponse positive a 
ladite comparaison, 

- des moyens d'ecriture dans un fichier en 
correspondence biunivoque d'un code specif ique et de cette 
sequence d 1 instructions standard executables pour constituer 
ladite instruction sp£cifique. 
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1 Introduction 

Computer programs that are powerful but small have always been desirable. Research has 
focussed on gains in speed as internal storage constraints recede into the background. 
Increasing chip densities and falling prices are encouraging larger binaries. However, this 
satisfies no one. Computer programmers and users often find that there is not enough 
memory. This is partly due to the quality of code produced by modern compilers, but some 
observers note that the current memory crunch began with the demand for feature-laden 
software systems [29]. In the end, smaller binaries need smaller system memories. All 
storage savings are a benefit if they do not result in slower computer systems. 

Internal storage is not the only reason to seek smaller code. Distributed systems 
depend more and more on code coming in from servers via a network. Smaller binaries 
decrease the time spent in communication. Stand-alone systems need less time to load from 
an internal disk. This is already done by network hardware that implements compression 
and decompression through on-board hardware. Most personal computers now use 
software-based compression to obtain more room on hard drives, but at a cost in 
performance. Can a program be written to exploit instruction code redundancy with an eye 
to preserving performance, and rewritten to take less room? The approach is attractive as it 
eliminates the decompression stage. The compressor output is an executable binary, not a 
smaller file that would be unreadable without the appropriate decompressor. 

We have implied that the code generators in today's compilers focus on increasing 
speed because memory is not the main issue. This is slightly unfair as some compiler 
optimizations for speed also result in less code. However, the general observation is that 
there is a trade-off between space and speed optimizations. Even if attention is paid to 
space, today's compilers generate code procedure by procedure as interprocedural data 
flow analysis is still expensive. The code within procedures is of a high quality, but 
achieving further space optimizations becomes expensive. Systems with multiple modules 
add other complications during analysis. 
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Research has shown that compilers generate code sequences that appear many times 
throughout the final object file [8][21 ][9]. We can reduce the size of the object file by 
keeping only one copy of the sequence, and then replacing all other copies with a procedure 
call. This technique is called procedural abstraction, and is one method for reducing the 
size of binaries. Throughout the thesis, we use the term compaction to describe the space 
savings achieved through procedural abstraction because the smaller binary is immediately 
executable, thereby distinguishing the results from compressed binaries produced by such 
UNIX utilities as compress or gzip; the compressed binaries must be decompressed 
before execution. 

We expect to see many more close instruction sequence matches than exact instruction 
sequence matches. By parameterizing the differences, we can replace all copies with one 
procedure. Most compaction schemes in the literature acknowledge the possibility of using 
inexact matches by parameterizing the replacement procedures, but because of the cost of 
extra storage instructions to copy data into and out of parameters, parameters are ruled out. 
They therefore constrain instruction matches to be exact. However, if instruction sequences 
are grouped together such that each group is covered by a procedure with the smallest 
number of parameters, then greater space savings than parameterless schemes are expected. 
This raises two questions: how do we group instruction matches into procedures without 
performing an exhaustive search of the solution space? and how much better is the space 
savings? 

Chapter 2 outlines the previous work on reducing the size of executable code. Chapter 
3 explains the compaction and parameterization problem in greater detail. Chapter 4 is a 
treatment of procedure parameterization, including heuristics for assigning similar sections 
of code to procedures and selecting procedure parameters with an eye towards good space 
savings. Chapter 5 summarizes our results and suggests future research directions. 
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2 Previous Work 

As mentioned in the introduction, compiler optimizations for space have received less 
attention in recent years than those for speed. There are far fewer published papers for the 
former. Some of the compiler optimizations for speed also save space, and some of these 
are reviewed. Other techniques use the results of information theory by choosing 
instruction and data representations that result in the smallest executable images given the 
statistical properties of a set of target programs; one such approach will be covered. 
However, assuming that we do not want to add compiler optimizations that transform and 
rearrange the internal representation, and cannot entertain the complexity and expense of 
adding hardware that decodes custom instruction sets, then space savings gains are 
achieved through procedural abstraction. Sections of executable code that match each other 
can be replaced by subroutine calls, replacing the many sections with one subroutine body. 

2.1 Compiler Optimizations Leading to Space Savings 

A compiler's back end converts front-end output (the intermediate representation, or IR) 
into executable code. Early compilers emitted machine code that was far less efficient than 
hand coded equivalents. Successive compiler writers employed heuristic solutions that 
improved code quality. These were applied at all compiler stages. Some transformed the 
source statements and others the machine code, but most concentrated on the IR. In the late 
1960s and early 1970s, IR transformations called optimizations began to attract more 
attention, were seriously studied, and, in some cases, were formalized. Most of these 
optimizations aim to increase the speed of code and, since a guiding intuition is that "the 
less code there is, the less time is spent executing," space savings are often an extra benefit. 
Compiler optimizations for modern high performance architectures show that the intuition 
is not always true, especially in many loop optimizations for code intended for parallel 
architectures. However, many of the optimizations performed within current compilers 
continue simultaneously to increase code speed and to reduce code size. 
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2.1.1 Common Subexpression Elimination (CSE) 

Ever since it was first published, this optimization has remained one of the most profitable 
[7]. Source code expressions which appear in different parts of a program or procedure, and 
whose argument values do not change between appearances, can be replaced by a 
temporary variable. In the example from Table 2.1 , the expression "x * y" appears several 
times in the C fragment. At each appearance of the expression, the values of x and y have 
not changed. Recomputing the expression is unnecessary. It is also a subexpression of "x 
* y * z*\ This larger expression, which appears again in the last statement, cannot be 
replaced by a temporary variable. The last argument, z, changes value, and the larger 
expression must be recomputed at "f At no point are the values of x and y changed, so 
the temporary Tl may be used throughout the example. A machine code version of the 
transformation is shown. Two machine instructions are saved, or 20% of the unoptimized 
code. 



Table 2.1 Common subexpression elimination — example 





Before 


After 




c = 


x * 


y; 




Tl = x * y; 






d = 


x * 


y * 


z; 


c = 


Tl; 






pseudo 


z « 


2; 






d o 


Tl * z; 




source 


e = 


X * 


y? 




z = 


2; 








f = 


X * 


y * 


z ; 


e s 


Tl; 
















f = 


Tl * z; 
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rl, 


(X) 




Id 


rl. 


(x) 






Id 


r2, 


<y> 




Id 


r2, 


(y) 






mul 


r3, 


r2. 


rl 


mul 


r3, 
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rl 




Id 


r4, 


<z> 




Id 


r4, 


(z> 




assembler 


mul 


r5. 


r2, 


rl 


mul 


r5, 


r3, 


r4 




mul 


r5, 


r5, 


r4 


Id 


r4, 


#2 






Id 


r4. 


#2 




Id 


r6, 


r3 






mul 


r6, 


r2, 


rl 


mul 


r7, 


r3, 


r4 




mul 


r7, 


r2, 


rl 












mul 


r7, 


r7, 


r4; 











Although space savings through CSE can be substantial, the optimization must still be 
carefully applied. For example, some numerical algorithms depend on the rounding 
behavior of a floating point representation. A small difference between the representation 
expected by the algorithm and that forced by the compiler may cause an algorithm to 



BNSDOCID:<XP nooKAnA i * 



5 



produce incorrect results. This occurs if the algorithm no longer terminates because of the 
change, or if the change results in an algorithm with an error bound exceeding algorithm 
specifications [10]. If the rounding method were to be changed after the calculation of d in 
the example above, the expression assigned to e would need to be recalculated. Replacing 
the second calculation of "x * y" by a load from the temporary variable would then be a 
modification of the algorithm. This is unacceptable, as any compaction of an executable 
must avoid changing the meaning of that program. 

Another case where savings may be lost occurs if there are significant demands on 
registers {register pressure). As registers are a precious high-speed resource, their 
allocation is important and difficult. If a CSE transformation stores temporary results in a 
register, then the number available decreases, leading to situations where the register 
requirements exceed supply. Spill code is the result, increasing the size of the program [3]. 
Therefore, if not used wisely. GSE may decrease speed and increase space requirements. 

2.1.2 Peephole Optimization 

This transformation examines and modifies the machine code generated by the compiler 
[20]. Despite the excellence of code generated for separate functions and procedures, 
looking at all of the code at one time may expose redundancies and inefficiencies. These 
usually occur at locations where the code for one procedure ends and another begins. 
Redundancies are removed by replacing them with equivalent, shorter instruction 
sequences. In essence, many different optimizations are applied, such as identification of 
redundancies, "unreachable code", "flow of control" optimizations, but in a very local 
region and at a low level [1]. 

A small window (the peephole) passes over the code, limiting improvements to the 
code visible within the window. A few examples are shown in Table 2.2. Example 1 
replaces a load-store pair with a single load (no need to store a value already known to be 
in memory). Example 2 removes the unconditional branch: the destination is the next 
instruction. (This code was probably generated for the last option of a "case" statement.) 
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The final example changes the conditional branch/unconditional branch pair into an 
equivalent instruction. Static peephole optimizations nearly always result in space savings. 



Table 2.2 Peephole optimization — examples 







Before 


After 


1 


Id X, r5 
st r5, X 




Id X, 


r5 


2 


A: 


br A 


A: 


3 


A: 


beq A 
br B 


A: 


bne B 





2.1.3 Unreachable Code Elimination 

Programmers often insert debugging statements into code, surrounding them with 
conditional statements. The conditional expression tests a programmer-specified compile- 
time constant, and the code is or is not executed at run-time depending on the value of the 
constant. Some program generators may emit statements with certain parts enabled or 
disabled, and this usually depends on programmer specified compile-time constants. If the 
language preprocessor does not support the removal of statements from the stream sent to 
the compiler (e.g. "#if def . . . #endif " in C), then the final program will contain 
sections that will never be executed. This code is called unreachable. Eliminating it from 
the program will not change the output, and may save considerable space [2], 

2.1.4 Useless Code Elimination 

In the 1970s, data flow analysis methods were formalized and used in compilers. 
Relationships between the values of variables are found through these methods. One 
example is live variable analysis [12]. A variable is said to be live at a specific point in the 
program if its value is used after that point at least once on some execution path. If the 
variable on the left hand side of an assignment is not live after that operation, then the 
computations on the right hand side may be eliminated, including the assignment. Since 
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there is no longer a need to compute the expression or store its value, space can be saved 
by omitting both the operation and the assignment from the final executable [1], 
Unreachable and useless code is sometimes referred to as dead code. 

2-1.5 Code Hoisting 

A generalized form of code motion, it moves very busy expressions to a point where space 
is saved, but not necessarily time. Intuitively, an expression is very busy at a specific point 
in a program's execution if (a) it will be evaluated on all possible execution paths, and (b) 
all evaluations occur before any change in the value of an expression argument. That is, 
there is at least one evaluation of the expression, no matter what control flow path is taken 
after the very busy point. Instructions required to compute the expressions are removed 
from each of the evaluation locations, and are replaced there by an assignment from a 
temporary variable. At the very busy point, one copy of the code for the expression is 
inserted, and the result is assigned to the temporary variable [1]. 

In the pseudocode example from Table 2.3, the exponentiation expression "c * * d" 
is very busy at a point just before the "if " statement. Every possible flow of control from 
this point to the bottom of the fragment will evaluate the expression. As exponentiation 
may require many instructions, a space savings occurs when the code is hoisted to the very 
busy point. Line counts are deceptive in this example: code hoisting is sometimes mistaken 
for a general form of loop-invariant code motion. However, what is important is that the 
body of code needed to calculate "c * * d" appears four times in the code before it is 
transformed, and only once after. 

The results are similar to CSE, but the motivation for code hoisting is space saving, 
not speed improvement. 

2.1.6 Copy Propagation 

Copy propagation eliminates redundant names for the same value {see example in Table 
2.4). In turn, fewer variables may be needed. Although this appears to be the substitution 
of many variables by one variable, which by itself does not reduce the number of 
instructions, copy propagation directly enables other space saving optimizations. Register 



Table 2.3 Code Hoisting — example 



before 


after 


if { exprl ) { 

switch ( expr2 ) { 

case 1: a = c ** d; 

break ; 
case 2: a = c ** d; 

a » a ** 2; 
break; 
default: b = c ** d; 
break; 

} 

a = b * 2; 
} else { 

a = C ** 4; 
f - C *• d; 

} 

print as- 


Tl o c ** d; 

if ( exprl ) { 

switch ( expr2 ) { 

case 1: a a Tl; 

break ; 
case 2: a = Tl; 

a ■ a ** 2; 
breaks- 
default: b » Tl; ! 
break; 

} 

a » b * 2; 
} else { 

a = c ** 4; 
f = Tl; 

} 

print a; 



pressure is lightened as fewer values need to be kept in registers, reducing the contribution 
of register spill code to the executable's size [3]. 



Table 2.4 Copy propagation — example 



before 


after 


a b x * x; 
y «= a; 

z = f oo ( y - 4 ) ; 
z b a; 

b = f oo ( z - 9 ) ; 


a « x * x; 

Z b f oo ( a - 4 ) ; 

b = foo( a - 9 ) ; 



2.1.7 Leaf-Procedure Optimization 

If a procedure does not call any other routines, it is said to be a leaf-procedure because these 
appear as leaves in the call graph. Several observations lead to space optimizations [3]. For 
instance, the register which holds the return address need not be saved or restored in 
anticipation of another procedure call, thereby saving two instructions. Allocating stack 



space is unnecessary if no local variables are used, eliminating the code needed to set up 
the stack frame. 

2.1.8 Parameter Promotion 

In some languages such as FORTRAN, all procedure parameters use the "call-by- 
reference" mechanism, and their values are read from memory by the procedure. 
Computations need these values, so they are typically held in registers. If the procedure 
calls other procedures, and passes in some of the same parameters, the memory reads are 
duplicated. Register demands will increase because the compiler cannot determine that the 
data is already stored in registers. Why read the same memory locations twice? Why store 
the newly loaded values back into memory because of register spillage? Instead we can use 
the register itself to pass in the data to procedures receiving the parameter. Changes to the 
parameters can be applied to the registers, and saved back to memory when exiting the 
procedure level that first received the parameter. 

Readers interested in detailed examples of leaf procedure and parameter promotion 
optimizations are encouraged to refer to "Procedure Call Transformations" section in the 
survey of optimization techniques for high-performance computers by Bacon, Graham and 
Sharp [3]. 

2.1.9 Span-Dependent Instruction Optimization 

Unlike RISC opcodes, CISC opcodes vary in size. For instance, branch instructions with 
nearby destination addresses may need only one or two bytes to indicate a destination 
address. Far destinations require more than two bytes to specify the address, extending the 
length of branch instructions to three or four bytes. If code blocks are carefully placed, then 
the contribution of the branch instruction sizes to execution sizecan minimized. Ideally, all 
branches would require only two bytes, but as this is not always possible, algorithm exists 
that maximize the short branches [27]. The larger the number of short branches, the greater 
the space savings. 
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2.2 Exploiting Similarity of Code 

There was much research into compiler optimizations in the late sixties and early seventies. 
However, optimization algorithms were still described by their implementations. 
Identifying common links between different transformations was difficult. 

A formulation describing optimization techniques in a machine independent manner 
was developed by Geschke [9]. It expressed the optimizations through the relationships 
between source language statements. Statement orderings and dependencies are found: e.g. 
whether statement B comes before statement A in the source text; whether B is needed by 
A; and whether the order of computing A and B is invariant. Most optimizations of the day 
could then be specified and implemented in a programming language via these relations 
[30]. 

One of the intriguing ideas in the dissertation by Geschke is the strongly similar 
subroutine optimization. It appears promising, but seems to have not been pursued beyond 
the dissertation. Two sub-trees in the intermediate representation (which is itself a tree) are 
arguments for a similarity function. This function's return value indicates the desirability 
of covering the two sub-trees with the code of one subroutine, with low values indicating 
good space savings potential. Put another way, the value is an indication of the cost of 
providing one subroutine body that will implement two different portions of the code. 
Several similarity functions are presented; one takes the parameter costs into consideration; 
another returns larger values if the differences between the subtrees occur in the middle of 
code, and low values if the differences are at the beginning or end. 

Finding sub-trees that are close matches and suitable for subroutines is not the same 
as actually turning them into subroutines. As Geschke phrases it, "what is desirable may 
not be feasible." Procedural abstraction, or more accurately "procedural extraction", must 
build the subroutine such that space and/or time of the emitted code is lower than if the 
subroutine was not formed. Even though the dissertation discussed the promise of the 
approach, the optimization was not included in a subsequent related work [30]. Other 
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researchers have picked up on the idea of procedural abstraction, but there are still other 
ways of exploiting code redundancy to save space. 

2.3 Matching Code Representation to Code Properties 

Compiler optimizations and procedural abstraction transform the executable code such that 
it requires less space to store and/or less time to execute. If the instruction decoding 
hardware can accept instructions with sizes varying by units of a bit, then there are 
additional possibilities. An information theoretic approach by Hehner [13] analyzes and 
recodes executable files based on the statistical properties of a sample of programs. These 
transformed files are smaller than the originals, but they require special hardware for 
instruction and data fetches (usually a custom decoder) and/or run-time support 
(interpretive execution). 

Part of the method revolves around finding the fewest number of bits necessary to 
represent the opcodes. A simple scheme examines a sample program set, finding the 
frequency of each opcode's use. Frequencies are turned into probabilities, and these in turn 
may be used to find the smallest possible encodings for the opcodes, such as those provided 
by Huffman encoding [15]. Frequently occurring opcodes are represented by fewer bits 
than those which are rarely used. The instructions are re-encoded, a dictionary of encodings 
attached, and a smaller executable is the result. — ■ 

However, instructions are usually not randomly distributed through an executable. 
Analysis of sample programs will show that some instruction pairs appear more frequently 
than others. In some architectures, one such pair may be a load into a register before an 
arithmetic operation. An analogy can be taken from the spelling of English words: if the 
letter "q" occurs in a word, then there is a high probability that a "u" is the next letter. 
Successors with a high probability are encoded in fewer bits than those with low 
probability. 

Probabilities may be found for m-tuples of instructions (for some fixed m>l), and used 
to find minimally sized encodings of the tuples. Re-encoding occurs as before. Taken to its 
extreme, m may be the number of instructions in the executable, in which case the whole 
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program can be represented by one opcode, but the dictionary will have only one entry — 
the program itself. There would be no space savings. The value of m must be chosen such 
that the size of the program's new encoding plus the size of the dictionary of encodings is 
minimal. 

Hehner examines two methods that exploit conditional probabilities between opcodes: 
iterative pairing and conditional coding. 

Iterative pairing makes repeated passes over the executable. Candidates are pairs of 
opcodes that, when replaced by a single new opcode, provide a space savings over and 
above a limit specified by the user. At each pass, the best compression candidate is found. 
(The pairs may be generalized to m- tuples.) When no more candidates exist, iteration stops. 
Having created an augmented instruction set — the original opcode set along with opcodes 
for compressed pairs — the algorithm recodes the executable, appending a dictionary of the 
new opcodes. 

A more aggressive algorithm recognizes that the final encoding may represent a local 
maxima of savings in the solution space of compression. By continuing the pairings where 
the previous algorithm left off, the compression of the candidates will increase code size. 
However, new candidates with excellent space savings may eventually appear. The final 
result could be a better compression of the executable than if the search for candidates had 
stopped in accordance with the original algorithm. 

Conditional coding (see Figure 2. 1) compresses by using m-tuples, for which conditioning 
classes of order m-1 conditional probabilities are needed. In the figure, the instruction to be 
encoded is in the operation register (upper right). Previous instructions are in the context 
register (upper left), with the register acting as a sliding window over the code. When the 
data from these two registers are passed to the encoder, the minimum redundancy encoding 
of the m-tuple is placed in the code register (bottom) and is output to the new executable 
file. 



H- 



m-1 



-H 



operation (code) 
register 



context S 
register 



inst 



i -m+ 



n 



msti 



I ina^^-j | instj \ 



1 






r i 


r ^ 


r 


encoder (decoder) 


..... 


1 



register 



Table: 
conditioning 
classes for 
all contexts, 
order m-1 



Figure 2.1 Conditional coding — encoder/decoder design 

Let the probability that insti follows inst^ m _^ inst^^y- instil ^ e e( I ua l to p 4 -. The 
size of code c,- corresponds to the number of bits required to represent the opcode in the 
context given, which is [6]: 

An encoding whose size is equal to this number of bits is a minimal redundancy encoding. 
The decoding operation is similar to encoding, and the role of registers in decoding is 
indicated in parentheses in the figure. 

Experimental results were obtained by compressing the code of the IBM 360 version 
of XCOM, a compiler for the XPL programming language. An earlier transformation of the 
code changed instruction operands from a fixed sized to a variably sized representation. 
Using iterative pairing, the re-encoding of the XCOM code resulted in a 77.4% reduction 
in the space required for opcodes. (Data such as addresses and constants were encoded 
using another space saving algorithm.) Conditional coding using 4-tuples yielded a 80.2% 
savings in opcode space. 

These results do have a cost. It is difficult and expensive to design and implement an 
instruction decoder for opcodes which vary in size down to a bit. Speed is required; 
therefore the decoding logic must be implemented in hardware. However, until the late 
1 970s, when internal memory implementations switched from magnetic core technology to 
integrated circuits, significant reductions in memory requirements of programs were worth 



the cost of increased decoder complexity. As well as being useful for reducing the size of 
executables, this research also helped computer architecture design. It applied rigour to the 
selection of instruction sets and opcode sizes, analyzing the properties of programs that are 
expected to run on a specific machine design. 

2.4 Compilation to Compact Code 

Using PL/I on an IBM System/370 as the experimental platform, Marks examined three 
code compaction methods [21]. These approaches compress executables by replacing long 
code sections with short subroutine calls. 

High-level subroutine recognition identifies redundancy in a programmer's source. 
The analysis uses the internal representation available immediately after lexing and 
parsing, with results matched against the original source statements. Each subroutine 
candidate, which is made up of source language statements, is assessed by the programmer 
who must then accept or reject it. Low-level subroutine recognition identifies repeated 
sections in the object code where the repetition has been introduced by the actions of the 
compiler. Compilers tend to produce the same instruction sequences over and over, and the 
method exploits this. Both the high- and low-level approaches modify some representation 
of the program, but after code generation, the smaller executables are loaded and run by the 
computer without any extra processing. 

Tailored interpretation takes place at the same level as low-level subroutine 
recognition, but takes advantage of very small sequences rejected as "not profitable" by the 
low-level scheme. The original object code sequences are replaced by custom opcodes, and 
the compressed code must be interpreted at run time. This produces very high space savings 
but at a great cost in speed. 

2.4.1 High-Level Subroutine Recognition 

To improve productivity, some programmers often write code with the assistance of tools 
such as lex, yacc, and others that generate source language stubs handling 
communication or message passing details for some development environments. The 
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programmer saves time, and the generated source code is guaranteed to match the 
programmer's specifications. If these tools are used several times during development, the 
same source code sequences might turn up several times. Programmers also repeat 
themselves when writing their own code, using idioms, and often within the same 
procedure. High-level subroutine recognition takes advantage of these types of repetition. 
In many ways it is easier to compress at this level as we are still some distance away from 
the complexity of the machine code. Ease in finding candidates comes at a cost, as they tend 
to be short sequences of source lines, and replacing these sequences by subroutine calls is 
not very profitable. Nearly identical sequences could be covered by one subroutine, but the 
exact cost of parameterizing a routine is difficult to calculate because of the distance from 
the machine code. Figure 2.2, shows the position of the compressor within the compiler. 



source scanner/ n- address compressor n-address code executable 
code lexer/ form form generator image 

parser 




J 

programmer 



Figure 2.2 Position of compressor in system — High-level subroutine recognition 

Subroutine recognition: The PL/I compiler's n-address Object-Machine Independent 
Representation (OMIR) was used by Marks. In its search for candidates, the algorithm 
treats the n-address form as a string of operators and operands. That is, each operator and 
operand is a "character", and the whole program is one long string. Substrings match if 
they: 

• agree in all their operators (if substrings A and B belong to the same candidate, then 
the i th operator "character" in A matches the i th operator "character" in B, for all i); 
and 



differ in at most three other "characters" (this allows for up to three parameters). 



for each operator Op do 
M[] :« empty; 

PI :e first occurrence of Op in the OMRI; 

while there is a position P2 after PI in the OMRI do 

m := LongestPossibleMatch( PI, P2 ) ; 

add PI, P2 to list M[m] ; 

PI : = P2 ; 
end while- 
find the best subroutine candidate B, and query the programmer; 
if programmer accepts B then 

make subroutine based on B; 

replace 0MIR instructions matching positions in B; 
end if; 
end for; 



Figure 2.3 Algorithm for High-level subroutine recognition 



For each operator in the instruction set, Figure 2.3's algorithm examines all substrings 
that begin with that particular operator. Since operators must match, constraining the start 
of substrings to operator positions is a useful optimization. Pairs of positions are taken 
together, and the longest possible matches (using the two-part definition from above) that 
start from these positions are recorded. 

Evaluating candidates: After searching for all matches, the algorithm finds the best 
subroutine candidate from the matches that begin with a particular operator. The 
programmer is allowed to accept or reject this candidate after seeing the line numbers 
corresponding to the suggested subroutine. If accepted, the subroutine body is added to the 
end of the program, and all matches in the OMER are replaced by a subroutine call, each 
call including the appropriate parameters. 

All operators are examined in turn. To keep the algorithm simple, influences between 
subroutine choices are ignored — the effect of compressing one subroutine before another 
is not examined. Marks does not indicate how much potential space savings are lost by this 
design decision. 
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Any manipulation of the source text is avoided even though rearranging the text may 
improve the opportunities for compaction. One reason given for this is that it requires flow 
analysis, but surely this should be simple for n-address code. A more convincing reason 
against manipulation is that rearrangement might increase the distance between the source 
code and the transformed OMIR. The greater the distance, the more difficult it is for the 
programmer to assess the effect of accepting the subroutine. Marks notices that there are 
some situations where rearrangements are overkill. For example, two sections may differ 
where one contains an extra statement in the middle. Splitting the sections into two is the 
simplest solution, with similar code before and after the statement forming the new 
subroutines. A set of heuristics such as this could identify and exploit similar situations 
without the need for data flow analysis. 

Results: The space savings of this scheme are poor — 1 % to 2%. No mention is made of 
the influence of PL/I, or the intermediate representation, on the results. It is unclear how 
the results would transfer to languages currently in use, such as C or C++. One suggestion 
is that the programmer's time may be better spent rewriting or redesigning the code than in 
accepting or rejecting compaction candidates! At this level, automatic methods of 
accepting subroutine candidates were not examined. 

2.4.2 Low-level subroutine recognition 

Intuition leads us to expect more code repetition at the machine level than higher up in the 
source code. Marks follows this thinking by applying the high-level subroutine recognition 
algorithm to a version of machine code that he called low-level code. Features of this 
representation include instruction lengths that correspond to the final code lengths. Branch 
destinations are not resolved, but since the lengths of final instructions are known, 
estimates of space savings are accurate. Registers have already been allocated, and their 
reassignment would be expensive. Arguments must be passed into routines via registers, 
but if all registers are tied up, then subroutine parameterization is expensive. To avoid any 
register reassignment during subroutines calls, parameters are not allowed. 
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source scanner n-address analyzer assembly 

code lexer form code code 

parser generator 




Figure 2.4 Position of compressor in system — Low-level subroutine recognition 

Subroutine recognition: The algorithm differs slightly from that in Section 2.4. 1 . Before 
finding matches, a hash chain is constructed for each opcode, linking together all positions 
where that opcode occurs. This reduces the time spent searching through the code. Each 
hash-chain is examined in turn, with matches performed on every pair of positions from the 
chain. Only identical code matches are accepted, although extra processing is added to 
examine "if — then— else" constructs. After a match is made, the pair of opcode positions, 
the length of the match, and the opcode value itself are formed into a group. 

When all hash-chains have been processed, transitive closure merges groups together. 
After the merge, each group contains all the code positions that start with the same opcode 
and match to the same length. The groups become subroutine candidates. 

Evaluating candidates: Groups are placed into a list, and the list is ordered by each 
group's space savings potential. Those groups corresponding to subroutines which are too 
short or too infrequent are eliminated as they result in little or no space savings. 
Overlapping and nested groups are retained even though some compaction choices become 
difficult. 
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To achieve optimal compaction, searching through all the possible orderings of 
subroutine construction would be necessary. The number of orderings increase if groups 
are allowed to be broken up. New compaction possibilities appear when additional groups 
are formed through the splitting of overlapping groups. Deciding whether and which nested 
routines should be compressed will make analysis even more difficult. A moderately sized 
program would end up needing a lot of computing power during compaction. Since finding 
an optimal solution is too costly, a greedy heuristic is used, which, among other things, does 
not consider the effect of subroutine choices on the speed of the final code. 

Subroutine selection begins by taking the group at the head of the list and adding it to 
a collection of "best groups." Proceeding down the list, groups are added to the "best" 
collection in the order they appear in the list, but with some constraints. None of the groups 
in the collection are allowed to be nested, and no groups may overlap. Marks pointed out 
that it is short sighted to eliminate less profitable groups because they overlap a more 
profitable group. The algorithm is searching for the best net gain: if rejecting the good 
group and accepting the overlapping bad groups leads to a better net gain, then this is the 
decision taken. Different orderings of the original list (best gain to worst; worst gain to best; 
randomized) do not result in great variations in the final space savings. 

The possibility of using parameters in subroutines was raised, but it was suggested 
that the benefit from this depended on a machine's architecture. If registers are used to pass 
parameters into subroutines, then these expensive resources are not available to the rest of 
the code. A result would be poorer overall code quality for the rest of the program. This is 
reasonable given that register allocation is more difficult when some registers have already 
been used for parameter passing. Any other parameter passing mechanism would be more 
space intensive, wiping out some of the benefits of the space savings. 

Results: Reported compaction ratios were encouraging, with space savings ranging from 
5.0% to 20.1%, averaging out at 15%. Compressed code executes 15% slower. Larger 
programs were better compressed, but the correlation between compaction ratios and low- 
level code size was non-linear. More efficient instruction matching techniques would 
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reduce the contribution of compaction to compile times. No numbers were given for the 
potential loss in compaction if parameters were allowed — it may indeed depend on the 
machine architecture, but a rough estimate would have been helpful. 

2.4.3 Tailored Interpretation 

After examining many code examples, Marks observed that groups rejected during low- 
level compaction could be used, but only if a cheaper subroutine calling mechanism could 
be found. This is intuitively appealing. We expect many matches that are two or three 
instructions long, many more than those that are, say, ten instructions long. However, all of 
the subroutine calling mechanisms require space. As the size of the average subroutine gets 
smaller, the limit to compaction becomes the size of the calling sequence. 

A solution to the calling sequence problem is to tailor the instruction set, matching the 
characteristics of the code. The tailored set is implemented through software.This is similar 
in spirit to Hehner's "iterative pairing" [13]. Marks acknowledges the influence of the idea, 
but instead of putting the effort into the hardware encoder/decoder, he puts it into an 
interpreter. 

Opcodes not used by the program take on a new meaning, by becoming indices into a 
table of the subroutines. The subroutines are unique to each program. Compressed code 
sizes include the space needed for the compressed code, the interpreter, and its tables. 
Figure 2.5 is a schematic of the system as it interprets one instruction. 

Compressed code will now be made up of three different types of instructions: 

• Ordinary machine instructions executed directly by the hardware with no interpre- 
tation. 

• Parameterless subroutines. 

• Parameterized subroutines. 

Arguments for the last instruction type are placed after the opcode, with the interpreter 
copying the argument value into the subroutine code. As shown in Figure 2.5, R2 in the 
unparameterized code corresponds to the parameter position in the parameterized routine. 
The grey data value from the compressed code's instruction is copied into the routine. A 
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maximum of one parameter per subroutine is allowed, a reasonable requirement 
considering the small size of the subroutines. 

Subroutine recognition: The method is taken from low-level compaction, but with the 
exception that matches may allow one operand position to differ. Nested subroutine calls 
are also permitted. 

Evaluating Candidates: The selection of candidates is more complicated because of 
parameters and nesting, and is implemented through a multi-pass algorithm: 

1 . A list of groups is created in order of descending space savings potential. 

2. Groups are considered in list order, provided that the group (a) has the largest space 
savings of those lower in the list and (b) does not overlap a selection with a greater 
space savings. As each group is selected, it is converted into a subroutine. 
Occurrences in the main program are replaced by a call to the new subroutine using 
the tailored instruction set. 

3. If groups were rejected because of step 2, repeat the algorithm from step 1 . 

After all possible groups are compressed, the interpreter table will be complete, and 
the compressed code is saved along with the table and the code for the interpreter itself. 

Results: At 50%, the compression ratio is impressively high, and is due to the use of 
parameters and to the utilization of smaller (two or three instruction) matches. Compression 
in one test case could have exceeded 50%, but the lack of unused opcodes imposed a limit. 
However, the code may run up to 15 times slower, and is an indication of the cost of 
interpreting each tailored instruction. Not all instructions depend on the interpreter — 
operating system calls and noninterpreted subroutines run at full System/370 speed. If 
space savings are needed, but specific sections of the code must run at full speed, then the 
"full speed" code could be tagged, and the compiler would not compress the tagged code. 

2.5 Analyzing and Compressing Assembly Code 

In a conference paper by Fraser, Myers and Wendt [8], code is compressed at the same level 
as Marks* low-level algorithm. The target machine is a PDP-1 1, and the compressor runs 
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Figure 2.5 Schematic of mechanism for tailored interpreter. 
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on a VAX 1 1/780, but there is no description of the high-level language or its compiler. 
Compaction occurs at the point after the generation of assembly code and before the 
assembly. Their technique (abbreviated to "FMW") differs from that of Marks in: 

• the algorithm for subroutine recognition, based on the construction of a suffix tree; 
and 

• the algorithm for identifying and classifying candidates as open or closed, where the 
cost of converting the former into subroutines is less than that for the latter. 

Subroutine recognition: The major computational effort during compaction takes place 

while finding and evaluating subroutine candidates. Borrowing an idea from text 

compression, Fraser et al. reduce the search costs by building a suffix tree. While there are 

many efficient string searching algorithms, repeated substring searches over the same 

string benefits from an auxiliary index to the main string. The suffix tree is an example of 

such an index (see Figure 2.6 for an example). 

A substring is described by the path from the suffix tree root node to an inner node or 
to a leaf node. The text of the substring is formed when the path edge labels are 
concatenated together. If an inner node has m leaf nodes descendants, then there are m 
occurrences of the substring described by the path from the root. Leaf nodes hold the 
starting positions of these substrings, and there are as many of these nodes as there are 
characters in the main string. 

For example, the suffix tree in Figure 2.6 has been built for the string "ababaabb", 
which is terminated by the end-of-string symbol "$". The path of heavy edges from the root 
node to a leaf corresponds to the substring "abb", which starts at position 6 in the main 
string. Three heavily circled leaf nodes containing 3, l and 6 are the leaf node descendants 
of the shaded internal node. Notice that the path from the root to the shaded node constructs 
"ab". This short substring appears three times in the main string, starting at positions 1, 3 
and 6. Suffix trees for assembly language program are much larger, and require algorithms 
that build the index in linear time and reasonable (polynomial) space [19]. As it contains 
indexes to all substrings, we can also get answers to such questions as, "What is the longest 
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Figure 2.6 Suffix tree example — "ababaabb$ 5 



substring that occurs in k places?" [25] Other researchers have used the suffix tree as the 
basis for regular text compression [23]. 

By treating the assembly language text as one string, substrings with several 
occurrences correspond to potential subroutines. Assembler directives are flushed directly 
to the compressed file as these must be left unmodified. For the rest of the assembly code, 
the authors restrict the possible matches in the text: two instructions at different positions 
are the same if the instruction texts are identical. Instructions that differ by one or more 
operands are rejected, so this technique does not allow for parameterized subroutines. This 
restriction is relaxed for branches, allowing for the possibility that different labels may 
refer to the same destination. 

Evaluating candidates: After finding the substrings, or "fragments," each is evaluated for 
subroutine potential. As with Mark's low-level subroutine recognition, infrequent or short 
fragments are discarded. The remaining fragments are grouped into two types. Open 
fragments end with a non-relative branch. When turned into subroutines, they are accessed 
by an unconditional branch, which is inexpensive. Closed fragments exit to the instruction 
following the fragment. To preserve the original control flow when using closed fragments, 
only the subroutine call may be used to access its corresponding routine. {See Table 2.5 for 
examples of open and closed types, where fragments are in boldface.) Because of the extra 
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instructions needed when converting to a subroutine, closed fragments will not always 
result in a space savings. An evaluation function estimates the space savings potential of a 
fragment, taking into account these restrictions. Fragments of either type that produce 
"negative" savings are discarded. 



Table 2.5 Closed and open fragments — example 



closed 


open 


push D, [sp] 


st B, r6 


Id rl, A 


push D, [sp] 


Id r2 , B 


push E, [sp-4] 


mul r3, r2, rl 


br LS 


st r3, B 


LI: add r3, r3 , r4 


push E, [sp) 


sub r2, rl, r4 


Id rl, A 


push D, [sp] 


Id r2, B 


push E, [sp-4] 


mul r3, r2, rl 


br L5 


mul r4 , r4, r3 


L2: add r4, rl, rl 



A fragment's control flow falls off the end if its last instruction is not an unconditional 
branch or subroutine return. Falling through to a fragment occurs when it is not accessed 
via a branch or a subroutine call. Since open fragments are accessed by an unconditional 
jump or a fall through; they cannot exit by falling off the end. Relocating: the fragment's 
routine will change instruction order as we cannot guarantee that the code following the 
routine has not changed. Closed fragments are entered by a subroutine call, so falling into 
them or entering them by a branch will result in incorrect code. Open and closed fragments 
must contain the destination of any internal relative branch, otherwise the flow of control 
is not guaranteed to reach the exit of the fragment. Overlapping fragments are discarded. 
Finally, closed fragments must pop off the stack exactly what they pushed on. Without this 
final restriction, the return address may not be on the top of the stack at fragment exit 
(subroutine return). 

An optimal algorithm is not used to choose fragments. Instead, a greedy algorithm 
orders the fragments by the value of the evaluation function, examining them in order of 
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decreasing space savings. If a fragment satisfies all criteria, it is converted into a 
subroutine. The code for the body is placed out-of-line. All occurrences of the fragment in 
the assembly text are replaced by unconditional branches (open fragment) or calls (closed 
fragment) to the subroutine. The process is repeated until there are no fragments left. 
Different fragment orderings may result in better compaction, but such orderings were not 
investigated by the authors. 

Results: After applying the compaction algorithm to a set of over 150 UNIX utilities, the 
authors reported space savings ranging from 0% to 39%, averaging at 7%. The compressed 
binaries took 1 to 5% more CPU time, but this was offset by reduced loading time, and the 
result was a 1 1% time savings. However, the wide range of compaction ratios raises 
questions. Why does some code compress well? Does it matter if programs are large or 
small? It is unclear how well the figures would hold for instructions sets of architectures 
currently in use. Much has changed since 1984, both in machine architectures (RISC, 
superscalar) and compiler technology (instruction scheduling, software pipelining). When 
compared to the results of Marks's low-level scheme, the average of 7% is poor. This may 
be due to the size of UNIX utilities: if applied to the PL/I compiler code, the average 
compaction might improve. How much the FMW algorithm would benefit from 
parameterized subroutines is unclear. 

2.6 Looking Forward 

There are three parts to the code compaction problem: 

• finding subroutine candidates; and 

• evaluating subroutine candidates; and 

• choosing subroutine candidates. 

Greedy algorithms solving the last problem were used in the two previous papers. They 
were used despite the possibilities for better space savings possible through careful 
ordering of subroutine construction. The impact that small fragment differences have on 
compaction rates was ignored. Solutions to the second problem, when applied at the level 
of assembly code, could not deal with candidates that needed parameters. 
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The first problem is solved efficiently through the use of suffix trees, and this is what 
we use to find subroutine candidates. We are interested in seeing the effect on code 
compaction on a search of the solution space of subroutine parameterizations, which we 
investigate in Chapters 3 and 4. A Sun 4 SPARC workstation was the platform for our 
experiments and our algorithm evaluation. UNIX utility binaries were used as input for the 
compaction algorithms. (The specific UNIX utilities used in the FMW paper were not 
specified, but that will not rule out comparisons with their results). Since the executable 
code is disassembled, no compilers need to be modified: all compaction is performed on 
code that is already loadable and executable. Chapter 5 summarizes the results; it suggests 
possible improvements and future research directions. 
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3 Compaction and its Problems 

Procedural abstraction replaces many instructions with a call to one procedure. This 
achieves compaction, which is a transformation of a program into a smaller binary that is 
immediately loadable and executable. The compacted binary does not need 
"uncompaction" nor run-time support (e.g. interpreters as in Marks "tailored 
interpretation"). Since "similar sections of code" is the intuition behind procedural 
abstraction, we can visually identify repeated appearances of the same code, thereby easily 
developing transformations leading to space savings. For instance, the left-side program in 





1 


before j 




1 


after | 


2020: 


clr 


%fp 




2020: 


clr 


%f P 


2024: 


Id 


[ %sp + 0x40 ] , %o0 




2024: 


Id 


[ %sp + 0x40 ] , %o0 


2028: 


add 


%sp, 0x44, %ol 




2028: 


add 


%sp, 0x44, %ol 


202c: 


sll 


%o0, 2, %o2 




202c: 


sll 


%o0, 2, %o2 


2030: 


add 


%o2 , - A , %o2 




2030: 


add 


%o2 , 4 , %o2 


2034: 


add 


%ol, %o2, %o2 




2034: 


add 


%Ol , %o2 , %o2 


207c: 


sethi %hi (0x2000) ," %gl 




207c: 


sethi %hi (0x2000), %gl 


2080: 


call 


0x16090 




2080: 


call 


0x17000 


2084: 


Id 


[%10+0x220j , %o0 




2084: 


Id 


[ %gl + 0x200 ] , %gl 


2088: 


Id 


[%10+0x220] , %ol 










208c: 


add 


%o0, %ol # %o0 




20d4: 


or 


%o3, 2, %o3 


2090: 


neg 


%ol, %ol 




20d8: 


call 


0x17 000 


2094: 


Id 


[ %gl + 0x200 ] , %gl 




2 Ode: 


mov 


%16, %o4 


2 0e4: 


or 


%o3, 2, %o3 




23b4: 


and 


%o2, %Ol, %Ol 


2 0e8: 


call 


0x16090 




20b8: 


call 


0x17000 


20ec: 


Id 


£%10+0x220] , %o0 




23dc: 


b 


0x23e4 


20£0: 


Id 


t%10+0x220] , %ol 










20£4: 


add 


%o0,%ol, %o0 




17000: 


call 


0x16090 


20£8: 


neg 


%ol, %ol 




17004: 


Id 


[%10+0x220] , %o0 


20fC: 


mov 


%16, %o4 




17008: 


Id 


[%10 + Ox220] . %ol 










1700c: 


add 


%o0,%ol,%o0 


23d4: 


and 


%o2, %ol, %ol 




17010: 


neg 


%ol # %ol 


23d8: 


call 


0x16090 




17014: 


return 


23dc: 


Id 


t%10+0x220J , %o0 










23e0: 


Id 


[%10+0x220] , %ol 










23e4: 


add 


%o0,%ol,%o0 










23©8: 


neg 


%ol, %ol 










23e8: 


or 


%17, 0x248, %17 





Figure 3.1 Compaction example 
Figure 3.1 is an abbreviated printout of a disassembled binary, and similar sections of code 
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are highlighted by boldface type. On the right-hand side, the boldfaced code from the left 
has been replaced by subroutine calls, and one copy of the body attached to the end of the 
program. 

Space savings are modest: 15 instructions are removed from the original, and 9 are 
added, resulting in a net savings of 6 instructions. Each of the replaced segments of code 
are identical and need no parameters. If many such transformations are applied, substantia] 
savings may result. 

3.1 The Problem 

The FMW technique has already handled the "no parameter" case. What happens to 
compaction if the sections of code are similar but not identical? We have copied some 
similar sections of code from a program and placed them in Figure 3.2. Their differences 
are highlighted by boldface type. In the same manner as a high-level language procedure 
call, we pass in the differing value to the new routine via a parameter. Figure 33 shows the 
result of procedural abstraction, including the copying of values into the parameter passing 
variable. The procedure body has been placed in the region beyond the "end of text" of the 
original program. 



2540: Id [%o0+0x30j , %o0 
2544: mov 0x9400, %ol 
2548: mov %14, %o5 
254c: b 0x16210 



25f0: Id [%o0+0x30] , %o0 

25 £4: mov 0xa0 00,%ol 

25f8: mov %14, %o5 

25fc: b 0x16210 



263C: Id [%o0+0x30] ,4o0 

264 0 : mov OxaOOO , %ol 

2644: mov %14, %o5 

2648: b 0x16210 



25d0: Id [%o0+0x30] , %o0 

25d4 : mov 0xa400,%ol 

25d8: mov %14, %o5 

25dc: b 0x16210 



2610: Id [%o0+0x30j , %o0 
2614: mov 0xa4 00,%ol 
2618: mov %14, %o5 
261c: b 0x16210 



2800: Id [%o0+0x30] , %o0 

2804: mov 0xb000,%ol 

2808: mov %14, %o5 

280c: b 0x16210 



Figure 3.2 Candidate instances — one parameter required 



In Figure 3.2 it is easy to find the number and the positions of parameters by simply 
comparing operand values for all instances, one operand position at a time. If two instances 
have different values at one position, then a subroutine covering the instances needs a 
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2540: mov 0x9400, %g4 25f0: mov 0xa000,%g4 263c: mov 0xa000,%g4 

2544: call etext+0x200 25f 4 : call etext+0x200 2640: call etext + 0x200 

25d0: mov 0xa400,%g4 2610: mov 0xa400,%g4 2800: mov 0xb000,%g4 

25d4: call etext+0x200 2614: call etext+0x200 2804: call et«xt+0x200 



etext +0x200 : 


Id [%o0+0x30] , %o0 


etext+0x204 : 


mov %g4 , %ol 


etext+0x208 : 


mov %14, %o5 


etext+0x208 : 


b 0x16210 



Figure 3.3 Candidate instances — turned into a subroutine 

parameter at that location. Our example above uses one parameter for the first operand of 
the second instruction. As more instances are covered by one subroutine body, the 
compaction rate improves. One way of covering many instances by one body is by passing 
in the differences to the procedure. In our example, this is done for differences in operand 
values, but it could also be applied to differences in instruction operations. This thesis will 
concentrate on the former. 

The compaction problem reveals a trade-off between coverage and parameters: if a 
procedure covers too many instance differences then parameter costs will erase space 
savings, but allowing no parameters can eliminate many opportunities for compaction. We 
propose a compromise by reducing the number of parameters by covering the instances 
with several procedure bodies (not just one), mapping instances to procedures in such a 
way that the number of parameters lead to costs that are tolerable. 

For example, covering all five instances in Figure 3.4 by one procedure would require 
many parameters. However, there is no need to turn all of them into procedure calls. 
Excluding E, we could group together the others (A, B, C and D) and cover them with one 
procedure. Operand values that are identical across each instance require only one 
parameter, thereby reducing the number of instructions needed for parameter copying. 
Another grouping takes advantage of the fact that instances B and C are identical, and 
covering these two by a procedure will not require parameters: excluding A, D and E from 
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587c: sethi %hi(0xa800), %o7 

5880: Id [ %o7 + Oxc8 J , %o7 

5884: sll %oS, 2, %o5 

5888: st %oO f [ %o7 + %o5 ] 

588c: sethi %hi(0xa0O0), %oO 

5890: ldsb ( %oO + 0x2ec 3, %o0 

5894: tst %oO 



B 

5b70: 
5b74: Id 
5b78: sll 
5b7c: 
5b80: 



sethi %hi(Oxa800), %o3 

[ %o3 + 0xc8 ] , %o3 
%ol , 2 , %ol 
st %o0, [ %o3 + %ol ] 

sethi %hi(OxaOOO), %o4 

5b84: ldsb ( %o4 + 0x2ec ], %o4 
5b88: tst %o4 



SceO: sethi %hi ( 0xa800) , %o7 

5ce4 : Id [ %o7 + 0xc8 ) , %o7 

5ce8: sll %o5, 2, %o5 

5cec: st %oO, ( %o7 + %o5 ] 

5cf0: sethi %hi(OxaOOO), %o2 

5cf4: ldsb [ %o2 + 0x2ec ], %o2 

5cf8: tst %o2 



65d4: sethi %hi<0xa800) , %o5 

65d8: Id [ %o5 + 0xc8 ], %o5 

65dc: sll %o5, 2, %o4 

6 5e0: st %oO r ( %oS + %o4 ] 

65e4: sethi %hi(OxaOOO), %o7 

65e8: ldsb [ %o7 + 0x2ec ] # %o7 

65ec: tst %o7 



5bec: sethi %hi(Oxa800) # %o3 

5bf0: Id [ %o3 + 0xc8 ], %o3 

5bf4: sll %ol, 2, %ol 

5bf8: St %oO, [ %o3 + %ol ] 

5bfc: sethi %hi(OxaOOO), %o4 

5c00: ldsb [ %o4 + 0x2ec ], %o4 

5c04: tst %o4 



Figure 3 A Instances supporting several procedure mappings 

compaction may or may not produce a space savings. These combinations must be 
examined and evaluated before making a compaction decision. 

Finding the best space saving requires looking beyond exact matches. Through clever 
grouping of instances and use of procedure parameters, we can outperform the "exact 
match only" FMW algorithm. Our task is to find these groupings without an exhaustive 
search. In the next chapter we will present and evaluate different heuristic solutions, but 
first there are some preliminary details to be covered. The rest of this chapter covers the 
disassembly and procedure search phases of our compaction scheme. 

3.2 Preliminaries to Procedure Searching 

Compaction through procedural abstraction changes the order of machine instructions 
within the binary file, but does not alter the result of computation. Marks [21] and Fraser 
et al. [8] compact the assembly language representation of the program. Many compilers 
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already translate source code into this representation (e.g. gcc and Ice), passing it on to 
an assembler as the last phase of processing. Therefore, compaction could be added as part 
of a compiler. 

In the previous two approaches, compaction can only be performed if the source code 
is available (i.e. "no source = no compaction.") A disassembler avoids this restriction. As 
it not only makes available the assembly code from the binary and processes all of the code 
linked together as a multi-module system, disassembled instructions could correspond 
exactly to the code found in the binary. What we see (the binary) should be what we get 
(the disassembled code). 

Disassembly has one drawback: separation of code from data is not always possible. 
The best we can do is find specific sections of the binary are guaranteed to be code. Symbol 
table information, data from binary interface headers, and knowledge of compiler 
conventions provide the precise locations of some instructions. However, it is expensive to 
look for code beyond the information from these sources. Even with techniques such as 
abstract execution of code, it is still impossible to guarantee that all instructions have been 
found [14]. Thankfully, the output from the disassembler using only symbol table 
information is still quite large. Rather than worry about increasing the coverage of 
disassembled instructions, a economical solution in this situation is to compact what is 
guaranteed to be code, and leave the rest unmodified. 

3.2.1 Disassembly 

The disassembler from GNU's source-level debugger gdb was used after several simple 
modifications [22][24]. gdb's code is very portable, supporting many different hardware 
platforms, and each supported platform has a corresponding machine description file. 
Whereas Sun 4 SPARC'S description consists of 1300 non-comment, non-blank lines of 
code, machines with more complicated instruction sets (e.g. Motorola 68030, Intel 80486) 
have larger descriptions. 

SPARC'S fixed instruction alignment on 4-byte boundaries makes possible a one pass 
disassembler, and because symbol tables are not absolutely necessary, stripped SPARC 
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binaries may be compacted. There may still be occasions during disassembly where 
function entry addresses, data locations and other information from the symbol table would 
make it easier to distinguish between code and data. On CISC machines with varying 
instruction sizes, the symbol table is a necessity. There would be little progress without the 
table as a given section of the binary may be disassembled several ways depending on 
which address is the start of an instruction. Entry points and function addresses from the 
table can resolve this type of ambiguity. 

Only SPARC binaries were compacted in this project. We assumed that the symbol 
table information was not available. However, if the binary is not stripped, then function 
addresses in the symbol table must be updated to reflect the positions of the functions in the 
smaller binary. 

3.2.2 Separating Code from Data 

The binary code was mapped into memory, and every instruction separately disassembled 
by a call to print_insn_sparc ( ) . Arguments for the procedure include the address of 
the instruction to be examined. The results returned are the readable text of the instruction 
and the address of any branch destination. Illegal instructions are those which the 
disassembler does not recognize and are automatically marked off as data. After the binary 
is disassembled, all basic block information is examined, and sections that do not fall 
within a basic block are also marked as data. 

Some data sections may contain jump tables. Since compaction will change the 
position and location of code, the table entries may need updating to reflect the addresses 
of the moved jump targets. Both Sun's C compiler cc and GNU's C compiler gcc place 
the table immediately following the jump instruction. Finding the length of the table helps 
separate code from data. Instructions indicated by cc jump table entries precede the code 
that performs table lookups. For gcc, the instructions are located immediately following 
the last entry in the jump table. However, compilers for other languages may position 
instructions and tables according to different conventions, so the problem of finding jump 
table locations may prove difficult. One solution using program slicing has been described 
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in the literature, and does not depend on knowledge of specific compiler implementations. 
Program slicing finds the subset of instructions which determine the value of the indirect 
branch register. From this subset, the location of the jump table is usually deducible 
[18][28]. However, we make the simplifying assumption that the binaries given to the 
compactor were produced by either cc or gcc. 

Delay slots on RISC architectures add a complication. Some compilers move into this 
slot the instruction immediately preceding a branch or jump. Therefore a fragment 
containing a branch or call must keep the branch instruction and its delay slot in the same 
procedure. However, there are cases where a delay slot instruction is also a branch 
destination, which usually occurs when two code blocks are fused together by an optimizer 
(a space saving optimization). If the slot belongs to a branch instruction, and if that branch 
is the exit from a procedure, then a simple solution to the problem is to make another copy 
of the delay slot instruction, keeping one copy in the procedure and the other within the 
original program [5]. 

3.2.3 Difficult Code 

Several classes of code are difficult to compact because it may not be possible to guarantee 
the correctness of their compacted versions. 

Self-modifying code treats some address contents as both code and data. For example, 
a direct dynamic linker patches in the address of the called subroutine at run-time by 
modifying the program [16]. Self-modifying schemes usually assume a fixed distance 
between the instruction to be changed and the instructions performing the change. Moving 
either instruction will overwrite some other address. Not only is it difficult to recognize 
self-modifying code, but it will also be hard to ensure that it will work after compaction. 

Arithmetic on branch destinations, usually performed on indirect branches, requires 
some data flow analysis for detection. Jump table lookups are easy to recognize, but other 
arithmetic on registers might not be used until some instructions later. Determining the 
possible destinations using some data flow analysis and arithmetic is necessary since there 
is the potential of some destinations located in sections considered data which should now 
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be viewed as code. So far, no code that has been disassembled has contained indirect 
branches of this sort, and we assume that this will not change. 

Some programs calculate an address based on the checksum of the instructions, 
usually for security purposes — either for copy protection, or to check against tampering 
of a file during transmission across a network. By definition, these files will not work after 
the slightest program change, and this rules out their compaction. 

3.3 Procedure Searching 

We have used McCreight's suffix tree algorithm [19] to find procedure possibilities. This 
algorithm is also the basis for the FMW procedure search, but our point of departure is in 
the treatment of the program. Where FMW is looking for instances of exact textual matches 
in the assembly code, we are only looking for instances where operations match and allow 
operands to differ between instances. 

Some properties of the suffix tree data structure have already been investigated in 
Section 2.5. Our implementation of the suffix tree algorithm accepts a string of operations 
as input, and constructs a tree with all possible substrings, and their positions, within the 
larger string. All substrings and positions are then examined (see next section), and passed 
to the algorithms that attempt to find the groupings of instances that will produce the best 
space savings given the substring instances (next chapter). 

For example, suppose we are given the following contrived example: 



add 


%sp f 0x44, %ol 


sll 


%o0, 2, %o2 


add 


%o2, 4, %o2 


add 


%ol, %o2, %o2 


sethi 


%hi (0x4000), %o3 


save 


%sp, -152, %sp 


call 


0x2 lbc 


sll 


%Ol, 4, %ol 


add 


%o3, 5, %o3 


add 


%ol, %o2, %o4 


sethi 


%hi (0x8000), %o2 


save 


%sp, -104, %sp 


call 


0x22c0 


return 
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which yields a string of operations: 



op 


add 


sll 


add 


add 


sethi 


save 


call 


Sll 


add 


add 


sethi 


save 


call 


return 


pos 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 



The suffix tree for this little program can be found in Figure 3.5, and is printed as a prefix 
traversal of the nodes. Each line of the figure corresponds to a node, and the string of 
instructions listed on each line is the label on the edge leading into the node. Numbers on 
the left hand side indicate the level (depth) of the node ("0" indicates "root"), and numbers 
on the right mark the terminal nodes for each suffix starting at that position in the original 
string. All other nodes are internal, and are also referred to as "non-terminals". 

Our list of substrings uses this last set of numbers (Table 3.1). Concatenating the 
labels encountered during a downward traversal from the root to any non-terminal forms 
the substring; all terminals descending from the non-terminal provide the starting positions 
of the substring. For instance, the substring "sethi save call" starts in two places: 
string positions 4 and 10. Notice that substrings in one row may be nested within others 
from a different row. Substring positions may be easily converted into real instruction 
locations given the fixed sized of SPARC instructions. Later analyses will examine these 
substrings, and the data gathered during disassembly will be needed. 



Table 3.1 Substring table from suffix tree 



substring 


starting 
positions 


add sethi save call 


3,9 


add add sethi save call 


2,8 


sll add add sethi save call 


1,7 


sethi save call 


4, 10 


save call 


5, 11 


call 


6, 12 


add 


0, 2, 3. 8, 9 



The McCreight algorithm has some important properties. It constructs the suffix tree 
using linear space, and executes in linear time. Each string position, which implies each 
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{empty} 
add 

sll add add sethi save call sll add add sethi save call return (0) 
sethi save call 

sll add add sethi save call return (3) 
return (9) 
add sethi save call 

sll add add sethi save call return (2) 
return (8) 
return (13) 
call 

sll add add sethi save call return (6) 
return (12) 
save call 

sll add add sethi save call return (5) 

return (11) 
sethi save call 

sll add add sethi save call return (4) 

return (10) 
sll add add sethi save call 

sll add add sethi save call return (1) 

return (7) 



Figure 3.5 Suffix tree for example — prefix traversal 



suffix (from that string position to the end of the string), adds one terminal node and at most 
one non-terminal node. Similarly, each suffix adds at most two new edges. Therefore, each 
step in the algorithm requires constant time. The algorithm also reduces the contribution of 
scanning edge labels to linear time. Each scan tries to determine where to split an edge and 
add a non-terminal. Unfortunately, linear time only applies once the edges suitable for 
scanning have been found. The effort to find the edge when the alphabet has n (in 
Figure 3.5, n = 6) characters yields a potential 0(n 2 ) algorithm. A hash table 
implementation could bring this down to 0(n Ig n). Therefore the algorithm is not linear, 
but it does find all substrings, including those nested within others. 

3.4 Quick Comparison of Different Methods 

There are some differences in the three methods (Marks [21], FMW [8], and ours) when 
examining the strings generated by the procedure search. This examination is necessary 
before passing the strings on to the instance grouping heuristics. Table 3.2 summarizes the 
differences between the three methods, with each category described below. 
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1 — Different procedure invocation methods: Using an unconditional branch to transfer 
control to a procedure is less expensive, in terms of both time and space usage, than using 
a subroutine call. However, the former method only works if the last instruction of the 
candidate instance is a branch to the same absolute address in every instance. If control flow 
proceeds directly to the instruction physically located after the candidate's last, then the 
subroutine mechanism must be used. Subroutines will behave correctly only if code for the 
candidate leaves a balanced stack (see the next category). Therefore, the instructions in the 
procedure determine how it is invoked within the main program. 

Invocation methods may be classified by their effect on a compacted program's space 
and speed. "Low space — high time" categorizes Marks' tailored interpretation, where 
subroutines are invoked through one instruction, but only with the support of an interpreter. 
FMW purposely sets out to use a "low space — low time" method, rejecting the use of 
parameters, thereby requiring only one instruction for the procedure invocation. "High 
space — low time" describes our approach, which needs extra instructions to copy 
parameters into registers when invoking a procedure. 

2 — Check for stack balancing: With most machine architectures, a subroutine call places 
registers and return addresses onto the user stack. If the code for a candidate does not push 
the same number of items that it pops, then the stack is returned to the caller in a state 
different than at call time, and the subroutine code is called "unbalanced". Aggressive 
balance checking schemes will also try to prune candidates (shortening by removing 
offending instructions from either end) to remove instructions until the stack is guaranteed 
to be balanced. Further complications involving stacks include expectations of data at a 
certain depth from the top of the stack, but pushing on a compacted routine's return address 
changes that depth. The compacted procedure instructions that use the stack must be 
changed to reflect the new depth. 

3 — Maximum number of parameters per procedure: If two instances differ in one or 
more operand positions, and if both of these instances are to be covered by the same 
procedure, then parameters will be needed by the procedure. Before the procedure call, 
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these operand values are copied into parameter passing variables (usually registers) and 
used by the subroutine. Procedure results may be returned via these parameters, so their 
values must be copied back into their proper locations before processing continues with the 
next instruction following the call. We expect to see more "nearly identical" matches than 
"identical", so parameters appear unavoidable. In the past, some researchers have rejected 
parameters out of hand as "too expensive" (FMW), while another allows a single parameter 
(Marks — tailored interpretation). The actual expense of parameterization has not been 
examined. 



Table 3.2 Possibilities during candidate examination 



compaction Method 


Marks 


FMW 


Ours 


1 . Different procedure invocation 
methods 


no 


yes 


yes 


2. Check for stack balancing 


yes 


yes 


yes 


3. Maximum number of parame- 
ters per created procedure 


1 


0 


depends on 
heuristic 



We are most interested in item three. If there are too many parameters, then the cost of their 
use eliminates any space savings. If there are too few, then many compaction candidates 
will be rejected. We are interested in determining the minimum number of parameters that 
will lead to space-savings when compacting a set of instruction instances. A parameterizing 
heuristic could fix the maximum number of parameters, rejecting all those exceeding this 
maximum but at the risk of finding no space savings. Conversely, another heuristic could 
allow as many parameters as needed as long as space savings are the result, but this might 
produce smaller savings than if the number of parameters were fixed to a small number. 
These possibilities are investigated in the next chapter. 

3-5 Summary 

We have presented our view of the compaction problem, and have suggested combining 
parameterization with the grouping of matches into procedures as a way to outperform a 
scheme prohibiting parameters. Our task is to develop these grouping algorithms such that 
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parameter costs are minimized and space savings maximized. Some of the preliminary 
requirements for compacting binaries were reviewed, including a description of those 
programs which cannot be guaranteed correct after compaction. Finally, we reviewed the 
substring construction algorithm (suffix tree) that provides the raw material from which to 
choose procedures. Not all substrings can be used, and we presented a few criteria that must 
be met. Now it is time to inquire into the benefits of parameters, at the same time examining 
heuristics which can select procedures such that parameter costs are minimized. 
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4 Searching for Subroutines 

Our compaction method for object code (binaries) has two phases: 

• converting the binary into a form in which we can locate repeating code sections; 
and 

• examining this code for suitability as procedures, rejecting or transforming them as 
needed. 

The previous chapter covered the first phase and this chapter examines the second. Given 
a set of instruction strings that are similar to each other, we will now suggest some 
heuristics that can convert these substrings into space saving procedures even if the strings 
are not identical. A heuristic approach avoids exhaustive searches of the partitioning 
solution space. This chapter continues with the introduction of some notation, followed by 
a quick overview of the criteria which compaction candidates must satisfy. The procedure 
parameterization problem is presented, and several relations are introduced that will 
support our heuristic algorithms. Finally, the heuristic algorithms themselves are 
motivated, presented, and their performance examined. 

4.1 Definitions 

In the past chapters, we have used such terms as "substrings", "instructions" and 
"operands" in an informal manner. To reduce ambiguity, we introduce these definitions. 

• If x is an instruction, then define \x\ to mean the number of operands contained in the 
instruction. If x is a string, then define \x\ to be the number of instructions in the string. 

• Define / to be an instruction consisting of an operation op and a (possibly empty) list 
of |/| operands, (o p co 2 , ...,o w . For instance, the instruction "sll %o0, 2, %o2" has 
op = sll, |/1 = 3, to, = %o0, co 2 = 2, and co 3 = %o2. The "no operation" instruction 
"nop" is represented by op = nop and |/| = 0 <i.e. no operands). If there is no ambiguity, 
the operation for instruction /' is op', and the instruction's i th operand is 
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Define program P to be a sequence of m instructions / 2 , — , l m . A substring starting 
at position i and with length L is denoted by l { L .. 

We write /' = /" to denote an instruction match between instructions /' and /", where 
two instructions may have different operand. This is the same as writing: 

in = in 

and op' = op" 

An exact match occurs between instruction |/'| and |/"| , and is written as /' = /" , if: 

i/i = in - 

and op' = op" 
and id/ = co/ 1 

Two substrings, /, L and /. ^ , are said to be instances of a (procedure) candidate if they 

do not overlap and if each operation is the same between instances; however, operand 

values may differ between instances. The number of instances in a candidate C is equal 

to |C| . Using our notation, two substrings are instances if: 

{i f i+ 1, £ + L- 1} r> + I, ...,y + L- 1} = 0 and 

Define a partition component k ("component" for short) to be a subset of instance in- 
dices from a procedure candidate (i.e. Kg {1, \C\} ). Each component is associated 
with a procedure body, and each instance in the component may be replaced in the orig- 
inal program by a branch or call to the associated procedure. There will be cases where 
a candidate instance will be excluded from any procedure, and left as found in the orig- 
inal program, because of the cost of the instance's inclusion. These excluded instances 
make up k c . 

Define a partition n of the procedure candidate C to be a group of disjoint components. 
Each component corresponds to one procedure body, and component members are in- 
stances that are replaced by branches or calls to the procedures. If the partitioning does 
not cover all instances, then those excluded are left as found in the original object file. 
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Using the notation for a partition, we define a partitioning n containing n partitions as 
D = {K t , k,, k v .... k„} , with the partitions satisfying the following conditions: 

i it j => k ( - n Kj = 0 and 
k £ n k ( . = 0 1 ^ i S /i and 

^^(00= <» W) 

J- 1 . 

4^2 Visualizing Operands 

Thinking about the parameterization problem is difficult when the number of candidate 
instances is large. An easier method for visualizing the relationship of operand values 
between instances is by converting the candidate instances into an array of operand values. 
Each row of the array will correspond to a candidate instance, and a column contains the 
values of operands at one specific position across all instances. Operand text is replaced by 
an ordinal value (i.e. "vl" for "value 1"). Our implementation leaves out the instruction 
operations from the array, as these are guaranteed (as per our restriction that instructions 
must be candidate matches) to be the same between instances. However, as an aid to 
connecting array rows with candidates instances, the instruction operations will be shown 
in our figures in shaded columns. After substituting ordinal values for operand text from 
Figure 3.2 of Section 3 . 1 , we have an array representation in (Figure 4. 1 ). Row and column 
indices are provided to improve readability. Scanning down each of the eight operand 





operands \ 


inst 




1 


2 


3 




4 


5 




6 


7 




8 


il 


Id 


vl 


v2 


v1 


mov 


v3 


v4 


mov 


vS 


v6 


b 


v7 


12 


Id 


v1 


v2 


v1 


mov 


v8 


v4 


mov 


v5 


V6 


b 


V7 


13 


Id 


v1 


v2 


V1 


mov 


v9 


v4 


mov 


v5 


v6 


b 


v7 


14 


Id 


V1 


v2 


V1 


mov 


v8 


v4 


mov 


v5 


v6 


b 


v7 


15 


Id 


v1 


v2 


VI 


mov 


v9 


v4 


mov 


vS 


v6 


b 


v7 ! 


16 


Id 


v1 


v2 


vl 


mov 


v10 


V4 


mov 


v5 


v6 


b 


v7 



Figure 4. 1 One parameter example — ft -array form 
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columns, we see that one parameter is needed for the operands in column 4, and this 
corresponds to the first operand of the "mov" instruction. Since all other operands have the 
same values across all instances, they do not require parameterization. 

We call this two dimensional array the Cl-array. Instances of the same candidate, 
made up of n substrings, are transformed into an array containing only the ordinal values 
for operand values. Each row contains the equivalent of the o> values for the instruction 
operands in i th substring starting in position p of the program (call it I p L ). 

Operand values are positioned in the row in the same order in which they appear 
within instructions and between instructions. That is, if CI; . = to' , Cl. k = to", and co' 
appears before co" in the l p L (either because co' is in an earlier instruction, or appears 
earlier in the same instruction) then j<k must hold. Since we know how many operands 
may be found in each of the substring's instructions (i.e. \l p \, ,|, \I p + L _ x \ ), any 
parameter choice based on the analysis of the Q -array can be related back to the instance's 
instructions and their associated operands. 

As stated earlier, we require that instruction operations to be identical with each other 
between substrings. A more flexible scheme relaxes this restriction, allowing differences 
in operations between instances. Given this relaxation, Geschke's "strongly similar 
subroutine" optimization [9] could be implemented. Parameterizing for operation and 
operand differences, with the parameter value is used to select a compacted procedure 
statement from the set of possibilities, could produce procedures that cover very long 
strings. However, this would add another dimension to the parameterization problem, and 
even more space would be required both at the procedure call site and in the procedure body 
itself for parameters. Future work could examine the space savings when allowing for 
instruction operation and operand mismatches. However, we will concentrate exclusively 
on operand mismatches. 
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4.3 Q. -array Partitions 
4.3.1 Motivation 

So far we have visually identified the positions of parameters. However, suppose we must 
parameterize something moderately large, as in Figure 4.2. What partitioning produces the 





operands 


inst 




1 
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3 




4 
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6 
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8 


9 




10 


11 


12 


H 


Id 


vl 


v2 


v3 


mov 


v4 


v5 


v6 


Id 


v7 


v6 


v3 


mov 


v4 


v8 


v6 


i2 


Id 


v3 


V9 


v10 


mov 


v9 


v11 


v3 


Id 


v3 


v9 


V12 


mov 


v9 


V13 


v3 


i3 


Id 


v3 


v9 


v10 


mov 


v9 


v11 


V3 


Id 


v3 


v9 


v12 


mov 


v9 


v13 


v3 


14 


Id 


v3 


v4 


v14 


mov 


v4 


V5 


v3 
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v3 


v9 


v15 


mov 


v4 


v16 


v3 
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v4 
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via 


mov 


v9 
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v20 
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v9 
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v3 


Id 


v3 


v9 


v22 


mov 
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v3 


17 
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Id 


v3 


v9 
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mov 


v9 


V26 


v27 



Figure 4.2 Complicated partitioning — a -array 

best compaction? If instances ii and i4 are placed in ic e (i.e. left uncompressed), then the 
other five instances could be covered by one procedure using 5 parameters. If they are not 
excluded, then the procedure would need 12 parameters, and space savings would be very 
unlikely. Before finding the minimum number of partitions, we must find out how many 
operand values are identical in each column. This is a scan that proceeds up and down (i.e. 
vertically) each of the columns of the Q -array, partitioning the set of instances into 
components with identical operand values. 

Another (somewhat contrived) example is shown in Figure 4.3. Scanning down each 
column reveals that there are no operand matches between instances. However, only one 
operand value appears in each instance. A procedure covering all instances need only 
receive one parameter, and all operand positions are tied together by the same value. We 
have reduced the number of parameters from 7 to 1. As it is possible to reduce the cost of 
jparameters by tieing together column positions sharing the same value within an instance, 
we scan each row of the Q -array from left to right (i.e. horizontally), partitioning the set of 
operand positions into components with identical operand values. 
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operands 


instance 




1 ■ 


2 


3 




4 






5 


6 




7 


11 


Id 


v1 
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v1 


tst 


vl 


nop 


sll 


v1 


v1 


b 


vl 
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v2 
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v2 


nop 
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v6 


b 
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Figure 4.3 Tieing example — Q -array 



An exhaustive search of the partitioning solution space will always find the 
parameterization producing the best space savings, but this is not efficient. A heuristic 
algorithm still seems attractive, and we could use the components of vertical and horizontal 
partitions to guide our search. Vertical partition components appearing many times in array 
columns will suggest candidate groupings producing procedures with fewer parameters 
than those components which appear only a few times. Similarly, horizontal partition 
components containing a large number of operand positions and many instance 
appearances will reduce parameterization by eliminating the duplication of parameter 
values. 

43.2 Constructing Vertical Partitions 

Vertical partitions span all instances of a candidate, and the members of each partition 
component are row indices from the ft -array. These members have the same operand value 
in a specific column (operand position) and therefore are grouped together. Values such as 
a vertical component's size (number of instances), and the number of times in which the 
component appears in a partitioning for a column (number of columns) are used by the 
heuristics. As these two numbers grow larger, the corresponding component becomes a 
desirable guide for grouping instances into procedures. 

Our main interest is in any vertical partition's components. Each component is 
uniquely determined by the n -array rows that it covers (instances), and this "name" is 
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stored in another array called r. This name refers to a set of array row indices. Each entry 
of r has a corresponding entry in the array G. Elements of G are a a set of ft -array columns 
(operand positions) in which the component appears as part of the vertical partitioning. 
Therefore, r and C are arrays of sets, with each set of rows (T z ) associated with a set of 
columns for which the instances have the same to -values (C t ). 

An algorithm to construct the vertical partitions and store the components, given an 
Q -array, is shown in Figure 4.4, and the results can be found in Table 4.1 . After initializing 

procedure buildVertical Partitions (CI , VAR T ; VAR G ) 

1 Initialize all elements of arrays r z ,G z to 0 

2 for j = 1 to numOperands /* number of operands in instance */ 

3 7= {1,2,, ..,|C1} 

4 while T*0 do 

5 select any t e T 

6 U := {/} 

7 for each i such that = Cl tj 

8 U :» V\Ji 

9 find z such that T z = U 

10 if not found, construct such T z at next unused value of z 

11 G z G i KJ j 

12 T : o r-/ 

13 return T, C 

Figure 44 Algorithm 2 — Findin g Vertical Partition Components 

all of the elements (line 1) in T and G, the algorithm examines each operand column of the 
Q -array in turn (line 2). All instances with the same co-value in this column are grouped 
together, one unique value at a time (lines 3 — 12), forming the vertical partition of column 
j. Each component (U) is used to find a z such that T z = U (line 9). If this is the first time 
that a component equal to U has appeared for this Q -array, then U is added to the T -array 
in the next open position (line 10). The column index; is added to G z9 indicating that the 
component described by T z appears in the vertical partitioning for this column. All 
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components for the column are found before proceeding to the next column <lines 3, 5, 12). 
After processing all columns, the two arrays of sets are returned to the caller. 

Running this algorithm on the Q -array of Figure 4.3 produces the components and 
component appearances shown in Table 4.1. By definition, all T z values for an operand 



Table 4.1 Vertical components for candidate in Figure 4.3 



z 


(components) 


G z (component appearances) 


before closure 


after closure 


1 


M 


1,2,3,6,7,8,9,11,12 


1.2,3,6,7.8.9.11.12,4,10,5 


2 


i2 t i3,i4,i5,i6 t i7 


1.6,7,8 


1,6,7,8 


3 


I2,i3 


2,3,5,9,11 


2,3,5,9,11,7,8,4,10,1,6,12 


4 


i4,i5,i6,i7 


2 


2,1,6,7,8 


5 


i4 


3,9,11 


3,9,11,1,2,4,5,6,7,8,10,12 


6 


i5 


3,5,9,11 


3,5,9,11,1,2,4,6,7,8,10,12 


7 


i6 


3,5,9,11 


3,5,9,11,1,2,4,6.7,8,10,12 


8 


17 


3,5,9,11,12 


3,5,9,11,1,2,4,6,7,8,10,12 


9 


11,14 


4,5,10 


4,5,10 


10 


I2,i3,i5,i6 f i7 


4,10 


4,10,1,6,7,8 ; 


11 


12,13,14,15,16 


12 


12,1,6,7,8 | 



column are disjoint, r array entries may be subsets of other components in the r-an-ay. For 
instance, r 2 is{i2, i3, i4, is, i6, 17}, and this component can be found in the 
partitioning of columns 1, 6, 7 and 8. Therefore, we expect that instances from any r -array 
entry which is a subset of r 4 will have the same value in these columns. After performing 
a closure operation, the G -array entries corresponding to{i2, i3},{i4, ±5, i 6 , ±7}, 
{i4}, {i5}, {16}, {i7}, {i2, 13, ±5, i6, !7},and{i2, 13, i4, i5, i6} will 
contain columns 1, 6, 7 and 8 after closure. The closure operation can be applied to the C- 
array elements either during or after partition component construction. 

4.33 Constructing Horizontal Partitions 

Vertical partitions suggest candidate groupings through components, as each component is 
a possible grouping of instances to a procedure. Since the instances of the component r t 
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have the same operand values in the columns contained in G t , the instances may be covered 
with a procedure that does not need parameters for the columns in G t . The remaining 
columns ( { 1, numColumns} -G z )do require parameters, and the values to be passed in 
could be tied together, and this results in a requirement for fewer parameters. 

For each instance (O -array row), operand positions (Q -array columns) with the same 
value are partitioned together. Components with a single member are operand values that 
appear only once within the instance (singletons), and are ignored when reporting ties. 
Table 4.2 contains the horizontal partitionings for the Q-array of Figure 4.2. Their 
construction is similar to the process for vertical partitions. 



Table 4.2 Horizontal partitions for example in Figure 4.2 



instance 


horizontal partition 


! il 


{1} {2} {3. 9} {4, 10} {5} {6.8.12} {7}{11} 


i2 


{1,6. 7. 12} {2. 4. 8. 10} {3} {5} {9} {11} 


13 


{1. 6. 7. 12} {2, 4. 8. 10} {3} {5} {9} {11} 


14 


{1 . 6. 7, 12} {2. 4. 8, 10} {3} {5} {9} {11} 


15 


{1. 6, 7. 12} {2. 8} {4. 10} {3} {5} {9} {11} 


i6 


{1 , 6. 7. 12} {2. 8} {4. 10} {3} {5} {9} {11} 


i7 


{1 . 6. 7} {2. 8} {4. 10} {3} {5} {9} {10} {11} {12} 



Given the horizontal partitions for each instance, we wish to find the ties that are valid 
for all instances of a vertical partition component. In our notation, capital letters are 
horizontal partitions, and lower case letters are their components. Given two partitions 
satisfying our definition in Section 4.1: 

f? = {$i»$2» • R ~ W r 2' • •» r m} 

where each of the components g and r are sets of column indices, a partition that is valid 

for both of them is called 7\ and defined as: 

T= {qnr\qe Q,re R} 

We write the operation implementing this relation as: 
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T = QnR 

As an example, we take one possible grouping of instances from Figure 4.2 — { i3 , 
is , i7 } . The following horizontal sets are taken from Table 4.2. We wish to know what 
ties will be valid for a procedure covering all instances of this partition. Given: 



i3 : 


{1. 6. 


7. 12} 


{2. 


4. 


8, 


10} 


{3} 


{5} 


{9} 


{11} 


i5: 


{1. 6, 


7. 12} 


{2. 


8} 


<4. 


10) 


{3} 


{5} 


{9} 


{11} ! 


17: 


{1. 6, 


7}{12} 


{2. 


8} 


{4. 


10} 


{3} 


{5} 


{9) 


{11} 



then the comp utation of T (where T happens to coincide with the value for i7) produces: 

T: {1, 6, 7} {12} {2, 8} {4, 10} {3} {5} {9} {ll} 

Using Venn diagrams, one can see the "n" operation applied to the different horizontal 
partition components in the computation of £3 r» 11 (Figure 4.5). Each of the intersections 
becomes a component of the tied partition. All columns in an intersection by themselves 
are operand positions that cannot safely share a parameter with any another position. For 
instance, i3 has the same operand values in columns 1, 6, 7, and 12; i7 does not include 
column 12 as part of the set. Therefore any tieing applied to both i 3 and i7 must give 
column 12 its own parameter, otherwise the tie will copy in an incorrect value into operand 
12 for instance i7. 




i7 



Figure 4.5 Finding a valid tieing partition 

Horizontal partitions are needed to find parameter ties, but parameter ties provide no 
clue as to which candidate groupings save the most space: parameter tieing makes a good 
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situation better. However, even a good tie partition will not reduce the number of 
parameters if the operand columns it covers do not need parameters. 

/* The horizontal partitions H for the candidate and the 

* candidate partitioning component K , also known as a 

* candidate grouping, are passed in. 

* Valid ties for all instances are returned in T. 
*/ 

procedure findValidTies ( H . K , VAR T) 

select any i € K ; 
k' : - k . - i ; 

T = H; 

for each i such that ie k' do 

T = Tr\H 
return T 

Figure 4.6 Algorithm 3 — Find tieing valid for partition instances 



With parameter ties valid for all component instances, we can calculate the cost of any 
mapping of candidate instances to procedures. Finding the components leading to the best 
space savings is the difficult part, and we look at this next. 

4.4 Heuristics 

The main motivation for using vertical and horizontal partitions is to provide a starting 
point for exploration of the procedure/parameter solution space. A brute force approach 
which examines each possible partitioning is expensive because it is combinatorially 
explosive. Suppose that the candidate has been converted into an Cl -array. If the number of 
rows in the array is m, and the number of partitions over the candidate is n and allowed to 
vary over 1 <> n <> m), then the number of unordered partitionings on m is: 

« 

m= t 

where 5(m, n) is a Stirling number of the second kind [26]: 
YrlH)*C(n,*)(«-ft) w 



(EQ 4.1) 



52 



Even reasonably sized candidates can produce large numbers of combinations, and our 
heuristics regularly meet up with 10 to 15 instances in a candidate (see Table 4.3). Casting 
the question "how should the candidate instances be partitioned" into a known optimization 
problem also appears to be difficult. 



Table 4.3 Possible number of partitionings 



number of 
instances (m) 


number of unor- 
dered partition- 
ings (Eq. 1) 


number of 
instances 
(m) 


number of unor- 
dered partition- 
ings (Eq. 1) 


1 


1 


8 


4140 


; 2 


2 


9 


21147 


3 


4 


10 


115.875 


4 


15 


15 


1 ,382,958,545 


5 


52 


20 


-5.174 x 10 13 


6 


203 


25 


-4.638 x 10 18 


7 


877 


30 


-8.467 x 10 23 



In this section we do not aim for optimal compaction. Instead we attempt to find space 
saving opportunities that other compaction algorithms (Marks, FMW) pass over. For this 
more modest goal, we may appeal to heuristics. Our basic rule of thumb is that a grouping 
of instances to a procedure should (a) cover as many instances as possible while (b) 
resulting in as few parameters as possible. Requirement (a) maximizes the ratio of released 
space to procedure body space, and (b) improves the space savings at the original instance 
positions. 

4.4.1 Calculating Space Savings 

Each heuristic is evaluated based on the space savings potential of its generated groupings. 
Computing space savings on architectures with variably sized instructions (e.g. Intel 
80x86, Motorola 680x0) must take into account the size of the operands. However, SPARC 
instructions have a fixed size, which simplifies the space savings calculation. Parameter 
costs are also slightly reduced on the SPARC machines because all branch and call 
instructions are paired together with an extra instruction called the delay slot. This slot 
forces a compile-time decision as to which instruction is fed into the processor pipeline 
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when there is a change in control flow during program execution. We use this slot to copy 
a value into a parameter variable during a procedure invocation. If there are no parameters, 
then the first instruction of the procedure may be placed in the slot. 

There are five parts to the calculation: 

S u — savings upper bound: No scheme can do better than that which removes all in- 
stances from the code. All of the following costs are subtracted from this upper bound. 

C, — parameters copied at procedure entry: A procedure call (closed subroutine) or 
branch (open routine) may use the delay slot to copy a parameter, so there is no cost 
for the first parameter. After the first, all other parameters cost one instruction. 

C 2 — parameters copied at procedure exit: One instruction per parameter is needed. 
These copy the results from parameter registers to the original variable locations. 

C — space needed for procedure body: Procedures are almost the same size as a can- 
didate instance. Two extra instructions are needed if the candidate is in closed form 
(subroutine) for the "return" instruction and its delay slot. No extra instructions are 
needed if the procedure is in open form. 

C 4 — branch or call instruction: As mentioned in the description of C, , one instruc- 
tion is needed for the transfer of control to the procedure. A second is for the delay 
slot. 

Each component of the candidate partition n (except k £ ) requires one procedure, and 
therefore contributes a list of costs. Different parameterizations for different partitions 
mean different values for C, , C 2 and C 3 . As instances of a candidate use thesame procedure 
invocation mechanism, C 4 remains fixed. In the equation, c f J is the cost c. associated with 
parameterization of k. . For n components, the space saving is: 

n 

Equation 4.2 is the basis for evaluation of the partitioning heuristics presented in the rest of 
this chapter. After finding all the space savings for a binary given the advice of a specific 
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heuristic, the total space savings for the whole file are divided by the original binary size, 
multiplied by 100, and reported as the "% space savings". There are four categories based 
on the number of lines that are disassembled as code: (a) <2K instructions; (b) 2K — 5K 
instructions; (c) 5K — 10K instructions; and (d) >10K instructions. Since we are 
investigating the performance of the heuristics, it is important to focus on how well they 
work, as distinct from how well the disassembler separates code from data. Improving 
disassembler code coverage is important, but as mentioned earlier in the thesis, if we are 
only given the binary file and its symbol table, 100% separation of code from data is 
expensive. 

For simplicity of analysis, there is no nesting or overlapping of procedures. Our results 
were compared against the FMW results (in Section 4.4.6) if the number of instructions 
saved was greater than 50 and the space savings at least 1 %. It does appear that the order 
in which candidates are evaluated by the heuristics affects the final result. We tried three 
different orderings: (a) by maximum potential space savings; (b) by the number of 
instances in the candidate; and (c) by length of an instance in the candidate (number of 
instructions). The best space savings of the three orderings are used for our reports. Out of 
200 UNIX utilities, 174 produced a space savings greater than 1 instruction, and 67 
produced a space savings greater than 50 instructions. 

4.4.2 FMW Compaction (Base Line) 

A big surprise was the poor performance of the FMW algorithm. Although the 1984 paper 
[8] reported space savings ranging from 0 to 39% with an average of 7%, our 
implementation of FWM on the SPARC produced far poorer results. This could be due to 
several factors: the SPARC register set's richness makes identical matches less frequent. 
The paper applied compaction to assembly code where, by definition, all the code and data 
is separated. FMW may be affected by presence of string constants within the code. This 
may explain the order of magnitude difference in the paper's results and our own 
implementation's results. However, this does not rule out comparison with our heuristics 
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— the substrings FMW finds will always be found by our algorithm. Results for the FMW 
scheme are in Table 4.4. 



Table 4.4 FMW compaction — % savings 



instructions 


average 


min 


max 


0 — 2K 


1.5 


1.1 


1.7 


2K — 5K 


2.3 


1.0 


5.4 


5K — 10K 


2.0 


1.0 


3.9 


10K + 


2.3 


1.1 


6.1 



4.4.3 Heuristic 1 

This scheme attempts to find one procedure by maximizing the number of instances in a 
partition component |rj and maximizing the number of operand columns not needing a 
parameter. Larger operand column sets (|CJ ) indicate that fewer parameters are needed for 
a partitioning based on T z . All instances not included in the procedure are left as found in 
the original binary. A pseudocode algorithm appears in Figure 4.6. 

procedure heuristicK F, G, VAR K ) 

max : = 0 ; 
maxSet :» -1; 

Vz do 

a :sa | r J ; '* number of instances */ 

b :» |CJ ; /* number of operand positions */ 

if ( a * b) > max then 
K := r,, 
| max : » a * b; 

return k ; 

Figure 4.7 Algorithm 4 — partitioning heuristic 1 

The algorithm is greedy as it finds the partition that covers the largest area of the Q - 
array. If the disassembled binary contains substrings that heavily favor some operand 
values over others, then the space savings suggested by the heuristic are high. Results are 
poor if the number of instances making up the returned component is large while the 
number of operands making up each instance is small. That is, we find better space savings 



if we have candidates with long instances (many instructions), but fewer of them in the 
component. One major drawback of heuristic 1 is that after making a good compaction 
choice, all excluded instances are completely ignored (i.e. placed in k c ) and are not re- 
examined for further compaction opportunities. Experimental results are in Table 4.5. 



Table 4.5 Heuristic 1 compaction — % savings 



instructions 


average 


min 


max 


,0 — 2K 


2.0 


1.1 


3.2 


2K — 5K 


2.2 


1.1 


4.8 


SK — 10K 


2.2 


1.0 


5.3 


10K + 


2.6 


1.0 


8.5 



4.4.4 Heuristic 2 

The drawbacks of the previous heuristic suggest another method. Partitionings of the 
candidates should contain more than two components (in heuristic 1, we have only 
{k C ' K i) )• As we are interested in all those components (groups of instances) leading to 
procedures with potential for space savings, only those containing three or more instances 
are examined. All vertical components are sorted by decreasing number of columns (|CJ ) 
— components with the greatest numbers of parameter free columns are examined first. 
The secondary sort field is the length of the component itself (|rj ) — the number of 
instances. Components k are added to the partitioning n if they are instance disjoint with 
those already in n (i.e. an instance may not be replaced by two procedures). After 
examining the sorted components, all instances excluded from the selected partitions will 
remain as found in the original executable. Note that the excluded instances can support no 
other space saving partitions, and this is an improvement from heuristic 1. The algorithm 
is found in Figure 4.8. 

This algorithm is also greedy, but it is allowed to progress further than heuristic 1. It 
is not significantly better (see Table 4.6). As the size of the program grows, it begins to 
perform worse than heuristic 1, and this goes against our intuition. After all, heuristic 2 
should contain all the space savings of the previous scheme. This is not the case, though, 



procedure heur is tic2 ( G, V , VAR VI) 



for each z '= l...|fl do 
if |rj<3 then 



/* discard set */ 



C := sort ( G, primary key decreasing \G^ , 

secondary key decreasing |rj ) ; ' 
H : » r sorted in the same order as G' ; 

fl 0. /* start with empty partitioning */ 

for each z = do /* look at each set in sorted order */ 

s :ia p x ' ; /* s is a set of instances (rows) */ 

if (3tes and 3kg n) and (/OK = r) then 

ignore s ; /* set contains instance already in P */ 



else 

n = n yj S ; 

return fl ; 



7* add set to partitioning 
/* return the partitioning 



Figure 4.8 Algorithm 5 — partitioning heuristic 2 



as 2 places the highest priority on components with the fewest parameters, as opposed to 
heuristic 1 which tries to find the largest possible space savings in one procedure. Even 
though a heuristic 1 procedure may have several parameters, if it has a sufficiently large 
number of instances (and each instance is made up of many instructions), then better space 
savings would result with this rejected partition than with many smaller heuristic 2 
procedures that use fewer parameters. 

Table 4.6 Heuristic 2 compaction — % savings 



instructions 


average 


mln 


max 


0 — 2K 


2.0 


1.2 


3.0 


2K — 5K 


2.3 


1.0 


5.7 


5K — 10K 


2.3 


1.0 


4.9 


10K + 


2.1 


1.0 


7.3 
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4.4.5 Heuristic 3 

One point of view holds that parameters are too expensive: we are willing to pay at most 
for one or two. Heuristic 3 limits partitions to a maximum number of parameters. The 
method for this scheme (algorithm 6) may be found in Figure 4.9. It is difficult to detect an 

procedure heuristic3 ( maxParms , G, V t VAR II) 

c := 1; 
j for each z do 

n : =» numParms ( G z , T z ) ; 

if ( n£ maxParms ) then 
N c := n; 

c := c+ I ; 

G" : = sort ( G' , primary key , secondary key decreasing \Nj^ ) ; 

: = T' sorted in the same order as G" ; 

/* Algorithm is now similar to latter half of Heuristic #2 */ 

n := 0; /* start with empty partitioning */ 

for each c- do /* look at each set in sorted order */ 

5 : = r c " ; /* s is a set of instances (rows) */ 

if (3ies and 3Ken)and (/Pi K = 7) then 

ignore s ; ■/* set contains instance already in P */ 

else 

n = nus; /* add set to partitioning */ 

return II; /* return the partitioning */ 

Figure 4.9 Algorithm 6 — partitioning heuristic 3 



improvement over the previous heuristics for small files, and some people may find the 
numbers insignificant. We have limited the number of parameters to one, and compaction 
does not appear to have suffered. The results of this heuristic are in Table 4.7. 

A flexible algorithm could tune the maximum number of allowable parameters to the 
size of the candidates — shorter candidates would not be allowed to have any parameters, 
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Table 4.7 Heuristic 3 compaction — % savings 



instructions 


average 


min 


max 


0 — 2K 


2.0 


1.2 


3.0 


2K — 5K 


1.7 


1.0 


3.9 


5K — 10K 


2.0 


1.1 


4.6 


10K + 


2.2 


1.1 


7.0 



while candidates spanning many instructions could produce large enough space savings to 
sustain a bigger parameter cost. The longer the candidate, the greater the probability that 
there will be mismatches between operand values in the same column. 

4.4.6 "Best of heuristic", and percentage improvement over FMW 

The "best of heuristic" is the result of running each heuristic on a candidate, and then 
choosing the partitioning/procedures with the best space savings suggested by any 
heuristic. In many cases the best space savings for a binary are suggested by only one of 
the heuristics, but frequently the "best of heuristic" results for a binary are better than the 
results of any one of the three (see Table 4.8). This suggests that better results will come 
from compaction schemes that are tuned to several candidates, i.e. use one heuristic for 
candidates with many short instances, use another for candidates with fewer but longer 
instances, etc. 



Table 4.8 Best of heuristic — % savings 



instructions 


average 


min 


max 


0 — 2K 


2.0 


1.1 


3.2 


2K — 5K 


2.4 


1.1 


6.4 


5K — 10K 


2.5 


1.1 


6.6 


10K + 


2.7 


1.1 


9.3 



Our final results appear worse if compared with the reported space savings in [8] (4% 
average versus our 2.7% average). However, as mentioned at the start of this section, our 
implementation of the FMW algorithm revealed that its performance was quite poor when 
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working with disassembled SPARC code. A more interesting comparison is made on a 
binary-by-binary basis, and the FMW results are compared with our own "Best of 
Heuristic". We have calculated the percentage space savings improvement of our scheme 
over FMW and averaged out these results for each of the program size categories. As 
programs become longer, our algorithm outperforms FMW even more. The large variances 
point out that occasionally a program is encountered with plenty of redundancy. Overall, 
however, these results show that better space savings may be achieved if we use parameters 
to change similar strings of instructions into one procedure. 



Table 4.9 "Best of heuristic" improvement over FMW 



instructions 


average (%) 


min (%) 


max (%) 


2K — 5K 


49.1 


13.9 


115.4 


5K — 10K 


60.8 


6.8 


173.6 


10K + 


84.4 


17.1 


566.7 



4.5 Further transformations that could be pursued 

The heuristics given above do not transform the candidates themselves, but only pick and 
choose among candidates. More aggressive schemes should seek to transform the candidate 
instances such that compaction opportunities are improved. 

4.5.1 Dropping instructions from instances 

If the parameters are necessary because of differences in the instructions at the beginning 
or end of candidate instances, then these instruction could be dropped. An increase in the 
potential space savings results since the cost of adding parameters for the offending 
instruction is significantly higher than the space saved by including the instruction in the 
procedure. 

4.5.2 Wasp waisting 

This is a variation of the previous technique, but the instruction causing the parameter is 
now in the middle of the candidate. To remedy this, candidates are split in two, and covered 



